In this prelab, you will familiarize yourself with some of the design and implementation issues of the upcoming lab 10. Please write or type up your solutions, and hand in a paper copy before class on Monday. (Unless we haven't covered the algorithm in question yet, in which case, wait to hand in those parts on Friday.)
As usual, you may work with one partner on the lab, if you choose. It is a little more open-ended than usual, and you have a little more time than usual, so I think that, if you can stand it, working with a partner is a good idea.
In this lab, you will write a program that plays the "Kevin Bacon Game". A person's "Bacon Number" is computed based on the number of movies of separation between that person and the actor Kevin Bacon. For example, if you are Kevin Bacon, then your Bacon Number is 0. If you were in a movie with Kevin Bacon, your number would be 1. If you weren't in a movie with Kevin Bacon, but were in a movie with someone who was, your Bacon Number would be 2. In short, your Bacon Number is one greater than the smallest Bacon Number of any of your co-stars.
Note that this is a take off of Erdos numbers (mine's 3, because my advisor's is 2), and the two can be combined to form the more elusive Erdos-Bacon number.
For fun and some additional background, you can try out the Oracle of Bacon at the University of Virginia.
S | {} | {} | s | ||||
Q | {} | s | u,v | ||||
s.dist | ∞ | 0 | |||||
s.prev | null | null | |||||
u.dist | ∞ | 1 | |||||
u.prev | null | s | |||||
v.dist | ∞ | 1 | |||||
v.prev | null | s | |||||
w.dist | ∞ | ∞ | |||||
w.prev | null | null | |||||
x.dist | ∞ | ∞ | |||||
x.prev | null | null |
S | {} | {} | s | ||||
Q | {} | s | u,v | ||||
s.dist | ∞ | 0 | |||||
s.prev | null | null | |||||
u.dist | ∞ | 5 | |||||
u.prev | null | s | |||||
v.dist | ∞ | 8 | |||||
v.prev | null | s | |||||
w.dist | ∞ | ∞ | |||||
w.prev | null | null | |||||
x.dist | ∞ | ∞ | |||||
x.prev | null | null |
k=0 | k=1 | k=2 | k=3 | k=4 | k=5 | |
s | 0 | |||||
u | ∞ | 5 | ||||
v | ∞ | |||||
w | ∞ | |||||
x | ∞ |
k=0 | k=1 | k=2 | k=3 | k=4 | k=5 | |
s | 0 | |||||
u | ∞ | 5 | ||||
v | ∞ | |||||
w | ∞ | |||||
x | ∞ |
As you know, graphs can be used to model a set of objects and relationships between those objects. For this lab, the objects in our graph are of two types: actors and movies; the relationships are whether an actor was in a given movie. In particular, we'll have a bipartite graph, that is, a graph where the vertices can be partitioned into two sets X and Y, and each edge connects an element of X (say, an actor) to an element of Y (say, a movie). For example, here is a very small subgraph of imdb's actor-movie bipartite graph:
This graph represents the three actors Kevin Bacon, John Malkovich, and Christian Bale, and the three movies Queens Logic, Empire of the Sun, and Batman Returns. The edges keep track of which actors were in which movies.
Here we have used an undirected graph such that the resulting path length between Kevin Bacon and some other actor X will be double X's Bacon Number. Thus, if you decide to represent the information in this way, you would need to divide the path length by 2 or use weights of 0.5 for the edges in order to make the correct computations.
Another representation could create a directed graph and weight the edges from actors to movies as 0 and from movies to actors as 1. Then, using Dijkstra's algorithm (an algorithm to find the shortest path between to nodes that we will soon discuss), you could find the shortest path from Kevin Bacon, and without modification this would represent and actor's Bacon number.
In any case, you will need to contruct Vertex, Edge, and Graph classes. Unlike previous labs, I will not list the required methods; at this point in the course you can probably figure out what methods and class variables you need. (We've already discussed many of these issues in class, and the textbook is always a decent resource if you're stuck.)
Once you have your basic Graph, you will create a class called Bacon. This program will read a data file of movie and actor listings, and will allow you to interactively query the system for various statistics, such as the Bacon number and path for any actor in the database.
The program requires a single argument, which is the name of the file containing the information on actors and the roles they played in movies. One optional second argument can be used to specify the initial "center" (in case you don't want it to always be Kevin Bacon). For example, here are three sample command line usages (where the Xmx2g thing is just giving you more heap to run your program with):
% java -Xmx2g Bacon imdb.full.txt # plays the game with the full data set centered at "Kevin Bacon" % java -Xmx2g Bacon imdb.pre1950.txt "Bela Lugosi" # plays the game with the center set to "Bela Lugosi" % java -Xmx2g Bacon imdb.no-tv-v.txt # plays the game with the no TV/V data set centered at "Kevin Bacon"
After reading in the data, the program
should then prompt the user for commands until an end-of-file (CTRL-D) is
reached (hasNextLine()
will return false).
On the lab you will be implementing the following required commands. Your program should repeatedly prompt the user for one of the commands below, until they choose to quit the program (with CTRL-D).
I am supplying my class files (the Graph.class, Edge.class, Vertex.class, and Bacon.class files) so that, if you have any questions about desired behaviour, you can try my program to see how it behaves. It's not completely debugged, but it should be able to answer many of your questions.
find <name>
Find the shortest path from the current center to <name>. The output should be of the format
<name1> -> <movie1> -> <name2> -> <movie2> -> ... -> Kevin Bacon (length n)
where <name1> is the person specified by the user and the movies and actors in between show the path from that actor to the current center. The '(n)' should indicate the Bacon Number. E.g., "find James Earl Jones" may output something like
James Earl Jones -> Magic 7, The (2008) (TV) -> Kevin Bacon (length 1)
and in the "no-tv-v" set:
James Earl Jones -> Three Fugitives (1989) -> Jeff Perry (I) -> Wild Things (1998) -> Kevin Bacon (length 2)
If someone is disconnected from the center simply print
<name> is unreachable
recenter <name>
Change the center to the given name if it exists in the database (otherwise, leave the center unchanged.)
avgdist
Calculate the average Bacon Number for the given center among all connected actor nodes. Your output should be the following
<avg><tab><name><space>(<number unreachable>)
The average should only be for the nodes reachable from the center. For example, you may output something like
3.7181183388460126 Kevin Bacon (803)
which means that, on average, an actor's Kevin Bacon number is 3.718, out of the actors connected to Kevin Bacon. There are 803 actors not connected to Kevin Bacon.
stats
Calculate structural statistics for the current graph. You should compute the average degree of all actor nodes, a table listing the number of actors with each degree that is non-zero, and the number of components of the graph.
For example, you may get something like
Average Degree: 1.114507057200467 Table of degrees Degree 1: 8605 Degree 2: 641 Degree 3: 131 Degree 4: 28 Degree 5: 10 Degree 6: 2 Degree 7: 3 Degree 8: 1 Degree 9: 1 Degree 10: 1 Number of Components: 30
allcenter
Calculate the average Bacon Number for all entries in the database. This is going to be a lot of data, so I won't put it all in here, but part of your output may look like
5.036533957845434 Dakota Fanning 883 3.6336065573770493 Linda Carola 883 0.9473684210526315 Seung-jin Lee 9404 3.7175644028103045 Helena Cihelnikova 883 3.8348946135831383 Lee Fierro 883 4.400117096018735 Richard McNamara (II) 883 3.8792740046838405 Stephen O'Neil Martin 883 3.9282201405152226 Robin Mary Paris 883 3.419789227166276 Nina Foch (I) 883 3.699765807962529 Peter Greene (I) 883 3.517096018735363 David Murray (II) 883 4.17903981264637 Thomas Martin (I) 883 1.9430051813471503 Daisuke Ryu 9230
Note that a low score does not necessarily make a good center. In particular, how is Seung-jin Lee, with a score of 0.94, able to beat Kevin Bacon's 3.879? Think about how this scoring may work and you will have your answer...
For example, your output may look like
Table of distances for Kevin Bacon Distance 0: 1 Distance 1: 79 Distance 2: 175 Distance 3: 2013 Distance 4: 4796 Distance 5: 1379 Distance 6: 97 Unreachable : 883
If you have the time or inclination, you may opt to include additional other commands for consideration towards extra credit. Here are some suggestions you may want to think about.