11:59.59pm, Sunday, May 4
You may work with one partner on this assignment.
In class, we have been discussing how Graph structures might can be used to represent relationships between groups of objects. For this assignment, you will be writing a program that allows you to play the "Kevin Bacon Game". A person's "Bacon Number" is computed based on the number of movies of separation between that person and the actor Kevin Bacon. For example, if you are Kevin Bacon, then your Bacon Number is 0. If you were in a movie with Kevin Bacon, your number would be 1. If you weren't in a movie with Kevin Bacon, but were in a movie with someone who was, your Bacon Number would be 2. In short, your Bacon Number is one greater than the smallest Bacon Number of any of your co-stars.
Note that this is a take off of Erdos numbers (mine's 3), and the two can be combined to form the more elusive Erdos-Bacon number.
For fun and some additional background, you can try out the Oracle of Bacon at the University of Virginia.
You will be writing a class called BaconNumber that will read a data
file and allow you to interactively query the system for the Bacon Number
and path for any actor in the database. The program should require a single
argument which is the filename containing the information on people and the
roles they played in a movie. On optional second argument can be used to
specify the initial center. After reading in the data, the program
should then prompt the user for commands until an end-of-file (CTRL-D) is
reached (hasNextLine() will return false).
Similar to what you did in past labs, if the filename argument begins with "http:" you should treat it as an URL and read the file from the network. This will enable you to play the game without having to download the entire file. To open a Scanner from an URL, you just need to do something similar to the following:
Scanner s = new Scanner(new URL("http://www.cs.oberlin.edu/").openStream());
% java -Xmx2g BaconNumber imdb.full.txt # plays the game with the full data set centered at "Kevin Bacon" % java -Xmx2g BaconNumber imdb.pre1950.txt "Bela Lugosi" # plays the game with the center set to "Bela Lugosi" % java -Xmx2g BaconNumber http://www.cs.oberlin.edu/~kuperman/csci151/labs/imdb.no-tv-v.txt # plays the game with the no TV/V data set centered at "Kevin Bacon"
The movie data file contains information on what movies a performer appears in. Every line contains information on one person appearing in one movie. The lines are formatted as follows:
<performer name>|<movie title>
The vertical pipe character '|' can be used to determine where the name
ends and the title begins. There will only be one '|' on a line and there
are no empty names or titles. java.lang.String has a number of methods that can
be used to divide up the line. (e.g., split("\\|"))
I have supplied several data files of varying sizes for you to work with.
Rather than cluttering up your account with these files, you can either use the links above for URLs. Also, once you have your lab folder created, you can run 151lab10setup from a lab machine and you'll get symbolic links to the files in the current directory.
Other than the small database, you'll almost certainly need to increase the amount of memory allowed via the -Xmx argument.
Your program should read in the specified file and in the default case, choose "Kevin Bacon" as the initial center. There are a number of commands you are to support in order query the database and change the center.
find <name>
Find the shortest path from the current center to <name>. The output should be of the format
<name1> -> <movie1> -> <name2> -> <movie2> -> ... -> Kevin Bacon (n)
where <name1> is the person specified by the user and the movies and actors in between show the path from that actor to the current center. The '(n)' should indicate the Bacon Number. E.g., "find James Earl Jones" in the "full" database yields
James Earl Jones -> Magic 7, The (2008) (TV) -> Kevin Bacon (1)
and in the "no-tv-v" set:
James Earl Jones -> Three Fugitives (1989) -> Jeff Perry (I) -> Wild Things (1998) -> Kevin Bacon (2)
Note that your links may differ, but the path length should be the same.
If someone is disconnected from the center simply print
<name> is unreachable
findall
Iterate through all actors and actresses and perform a find operation on them.
recenter <name>
Change the center to the given name if it exists in the database. Do not change the center if the name does not.
center
Calculate the average Bacon Number for the given center among all connected nodes. Your output should be the following
<avg><tab><name><space>(<number unreachable>)
The average should only be for the nodes reachable from the center. In the top250 database, I get the following
3.8792740046838405 Kevin Bacon (883)
3.7286171816126603 Kevin Bacon (674)
and in the "no-tv-v" set I get
2.937249837432196 Kevin Bacon (35237)
allcenter
Calculate the average Bacon Number for all entries in the database. NOTE: this can take a very long time on larger data sets.
In the top250 set, my program finds "Obsession" is the best
center (0.5) and there are 24 people tied for the worst center including
"Kai Ivo Baulitz" (~6.012).
In the top 250 set, my program finds "Natasha Abramova (I)" is the best
center (0.875) and the worst center is
"Lina Gennari" (~5.610).
You may opt to include additional other commands for consideration towards extra credit. For any additional commands you implement, you should document them in the README file. Be sure to explain what it does and how someone could use it.
Here are some suggestions
The longest Bacon Number I found in the 'imdb.no-tv-v.txt' dataset for Kevin Bacon was 8 ("Jessica Drizd" and others). "Kevin Bacon" has a center value of ~2.937 while "Sean Connery" has ~2.803 indicating that he is a better center than Kevin Bacon. The Oracle of Bacon has a top 1000 list of centers which could be used to search for better values.
As we have been discussing graphs, It should be no surprise that a good way to represent these acting relationships would be through a graph. There are a number of ways in which this can be done, however, if you want to maintain a simple graph you might want to have both movies and actors be vertices and the edges simply being relationships between them.
While an undirected graph could be used, the resulting path length will be double the Bacon Number. You would need to divide the path length by 2 or use weights of 0.5 for the edges. Another technique would be to create a directed graph and weight the paths from actors to movies as 0 and movies to actors as 1. Then, using Dijkstra's algorithm, you can find the shortest path where all actors and actresses that are listed for a movie can be consider equally.
Remember that it is best to build and test your program incrementally. Construct your Graph class and be sure to include test cases in the main method.
If you decide to either use or model part of your implementation off of what is in the book, be sure to give proper credit in the methods or comments at the start of the file.
Use handin to submit the following files:
If you adhered to the honor code in this assignment, add the following statement to your README file:
I affirm that I have adhered to the Honor Code in this assignment.
If you work with a partner, just submit one solution per team.
Information courtesy of The Internet Movie Database (http://www.imdb.com/). Used with permission. The data should only be used for personal and non-commercial purposes.