Prelab 10

Everything's Better with Bacon
Due by 10am, Monday 3 Dec 2012
(Except for three parts that we won't have covered in class yet, which will be due on Friday, December 7.)

In this prelab, you will familiarize yourself with some of the design and implementation issues of the upcoming lab 10. Please write or type up your solutions, and hand in a paper copy before class on Monday. (Unless we haven't covered the algorithm in question yet, in which case, wait to hand in those parts on Friday.)

As usual, you may work with one partner on the lab, if you choose. It is a little more open-ended than usual, and you have a little more time than usual, so I think that, if you can stand it, working with a partner is a good idea.


In this lab, you will write a program that plays the "Kevin Bacon Game". A person's "Bacon Number" is computed based on the number of movies of separation between that person and the actor Kevin Bacon. For example, if you are Kevin Bacon, then your Bacon Number is 0. If you were in a movie with Kevin Bacon, your number would be 1. If you weren't in a movie with Kevin Bacon, but were in a movie with someone who was, your Bacon Number would be 2. In short, your Bacon Number is one greater than the smallest Bacon Number of any of your co-stars.

Note that this is a take off of Erdos numbers (mine's 3, because my advisor's is 2), and the two can be combined to form the more elusive Erdos-Bacon number.

For fun and some additional background, you can try out the Oracle of Bacon at the University of Virginia.

Graph Practice

  1. Run the unweighted shortest paths algorithm (BFS) on the graph below. Remember that this algorithm ignores edge costs, so you can ignore them too. In order to keep track of your calculations, fill in the following table as you go along, where each column represents one iteration of the while loop. See the class notes for Wednesday, November 28 for an example of this trace-through, and for our algorithm. (In the table below, the first column is the initialization. The second column is the first pass through the loop. The example in class started with this second column.)

    S  {} {} s
    Q  {} s u,v
    s.dist  ∞  0          
    s.prev  null  null          
    u.dist  ∞  1          
    u.prev  null  s          
    v.dist  ∞  1          
    v.prev  null  s          
    w.dist  ∞  ∞          
    w.prev  null  null          
    x.dist  ∞  ∞          
    x.prev  null  null          

  2. Now run Dijkstra's algorithm on the same graph. In order to keep track of your calculations, fill in the following table as you go along. Refer to the notes for Friday, November 30th for an example of a trace-ghrough and a description of the algorithm.

    S {} {} s        
    Q  {} s  u,v        
    s.dist  ∞  0          
    s.prev  null  null          
    u.dist  ∞  5          
    u.prev  null  s          
    v.dist  ∞  8          
    v.prev  null  s          
    w.dist  ∞  ∞          
    w.prev  null  null          
    x.dist  ∞  ∞          
    x.prev  null  null          

  3. Run the Bellman-Ford algorithm on the following graph, filling in the table of your calculations as you go along. (k is the number of iterations so far.) (See the notes from Monday, December 3rd for an example and the algorithm.) (You should hand in this part and the following two on Friday, December 7th.)

       k=0 k=1 k=2 k=3 k=4 k=5
     s 0
     u 5

  4. Run the Bellman-Ford algorithm on the following graph, filling in the table of your calculations as you go along. (The edges incident to x are different.) (Hand in this part on Friday.)

       k=0 k=1 k=2 k=3 k=4 k=5
     s 0
     u 5

  5. Run the topological sort algorithm on the following graph. List the vertices in topological order. Break ties alphabetically and use a queue as your worklist. (See the notes from Monday or Wednesday's lecture for the algorithm and an example, then hand it in on Friday.)

Part 1 - Graphs

As you know, graphs can be used to model a set of objects and relationships between those objects. For this lab, the objects in our graph are of two types: actors and movies; the relationships are whether an actor was in a given movie. In particular, we'll have a bipartite graph, that is, a graph where the vertices can be partitioned into two sets X and Y, and each edge connects an element of X (say, an actor) to an element of Y (say, a movie). For example, here is a very small subgraph of imdb's actor-movie bipartite graph:

This graph represents the three actors Kevin Bacon, John Malkovich, and Christian Bale, and the three movies Queens Logic, Empire of the Sun, and Batman Returns. The edges keep track of which actors were in which movies.

  1. Which of the following graphs are bipartite? (Briefly explain your answers.)

  2. What are the Bacon numbers of B, E and G in the graph below? (Keep in mind that nodes along one path alternate between actor nodes and movie nodes.)

Here we have used an undirected graph such that the resulting path length between Kevin Bacon and some other actor X will be double X's Bacon Number. Thus, if you decide to represent the information in this way, you would need to divide the path length by 2 or use weights of 0.5 for the edges in order to make the correct computations.

Another representation could create a directed graph and weight the edges from actors to movies as 0 and from movies to actors as 1. Then, using Dijkstra's algorithm (an algorithm to find the shortest path between to nodes that we will soon discuss), you could find the shortest path from Kevin Bacon, and without modification this would represent and actor's Bacon number.

In any case, you will need to contruct Vertex, Edge, and Graph classes. Unlike previous labs, I will not list the required methods; at this point in the course you can probably figure out what methods and class variables you need. (We've already discussed many of these issues in class, and the textbook is always a decent resource if you're stuck.)

  1. What methods will your Graph class support? Rather than just listing all the operations we discussed in class, think about the ones that will be useful for this lab. In particular, use graph terminology to describe the graph operations, but also consider how they will be used in the Bacon game. List 3 different methods.

Part 2 - Everything's Better with Bacon!

Once you have your basic Graph, you will create a class called Bacon. This program will read a data file of movie and actor listings, and will allow you to interactively query the system for various statistics, such as the Bacon number and path for any actor in the database.

The program requires a single argument, which is the name of the file containing the information on actors and the roles they played in movies. One optional second argument can be used to specify the initial "center" (in case you don't want it to always be Kevin Bacon). For example, here are three sample command line usages (where the Xmx2g thing is just giving you more heap to run your program with):

% java -Xmx2g Bacon imdb.full.txt 
    # plays the game with the full data set centered at "Kevin Bacon"

% java -Xmx2g Bacon imdb.pre1950.txt "Bela Lugosi" 

    # plays the game with the center set to "Bela Lugosi"

% java -Xmx2g Bacon 
    # plays the game with the no TV/V data set centered at "Kevin Bacon" 

After reading in the data, the program should then prompt the user for commands until an end-of-file (CTRL-D) is reached (hasNextLine() will return false).

Commands to be supported

On the lab you will be implementing the following required commands. Your program should repeatedly prompt the user for one of the commands below, until they choose to quit the program (with CTRL-D).

I am supplying my class files (the Graph.class, Edge.class, Vertex.class, and Bacon.class files) so that, if you have any questions about desired behaviour, you can try my program to see how it behaves. It's not completely debugged, but it should be able to answer many of your questions.

  1. find <name>

    Find the shortest path from the current center to <name>. The output should be of the format

        <name1> -> <movie1> -> <name2> -> <movie2> -> ... -> Kevin Bacon (length n)

    where <name1> is the person specified by the user and the movies and actors in between show the path from that actor to the current center. The '(n)' should indicate the Bacon Number. E.g., "find James Earl Jones" may output something like

        James Earl Jones -> Magic 7, The (2008) (TV) -> Kevin Bacon (length 1)

    and in the "no-tv-v" set:

        James Earl Jones -> Three Fugitives (1989) -> Jeff Perry (I) -> 
        Wild Things (1998) -> Kevin Bacon (length 2)

    If someone is disconnected from the center simply print

        <name> is unreachable
  2. recenter <name>

    Change the center to the given name if it exists in the database (otherwise, leave the center unchanged.)

  3. avgdist

    Calculate the average Bacon Number for the given center among all connected actor nodes. Your output should be the following

        <avg><tab><name><space>(<number unreachable>)

    The average should only be for the nodes reachable from the center. For example, you may output something like

        3.7181183388460126  Kevin Bacon (803)

    which means that, on average, an actor's Kevin Bacon number is 3.718, out of the actors connected to Kevin Bacon. There are 803 actors not connected to Kevin Bacon.

  4. stats

    Calculate structural statistics for the current graph. You should compute the average degree of all actor nodes, a table listing the number of actors with each degree that is non-zero, and the number of components of the graph.

    For example, you may get something like

        Average Degree: 1.114507057200467
        Table of degrees 
        Degree    1:	    8605
        Degree    2:	     641
        Degree    3:	     131
        Degree    4:	      28
        Degree    5:	      10
        Degree    6:	       2
        Degree    7:	       3
        Degree    8:	       1
        Degree    9:	       1
        Degree   10:	       1
        Number of Components: 30
  5. allcenter

    Calculate the average Bacon Number for all entries in the database. This is going to be a lot of data, so I won't put it all in here, but part of your output may look like

        5.036533957845434	Dakota Fanning	883
        3.6336065573770493	Linda Carola	883
        0.9473684210526315	Seung-jin Lee	9404
        3.7175644028103045	Helena Cihelnikova	883
        3.8348946135831383	Lee Fierro	883
        4.400117096018735	Richard McNamara (II)	883
        3.8792740046838405	Stephen O'Neil Martin	883
        3.9282201405152226	Robin Mary Paris	883
        3.419789227166276	Nina Foch (I)	883
        3.699765807962529	Peter Greene (I)	883
        3.517096018735363	David Murray (II)	883
        4.17903981264637	Thomas Martin (I)	883
        1.9430051813471503	Daisuke Ryu	9230

    Note that a low score does not necessarily make a good center. In particular, how is Seung-jin Lee, with a score of 0.94, able to beat Kevin Bacon's 3.879? Think about how this scoring may work and you will have your answer...

  6. table - print out a table of the counts of bacon numbers for the given center from 0 up to the longest

    For example, your output may look like

        Table of distances for Kevin Bacon
        Distance    0:	       1
        Distance    1:	      79
        Distance    2:	     175
        Distance    3:	    2013
        Distance    4:	    4796
        Distance    5:	    1379
        Distance    6:	      97
        Unreachable :	     883

Additional commands

If you have the time or inclination, you may opt to include additional other commands for consideration towards extra credit. Here are some suggestions you may want to think about.

  1. findall - iterate through all actors and actresses and perform a find operation on them.
  2. longest - print out one of the longest shortest paths to the center
  3. movies <name> - list all outbound edges from a given name
  4. most - list the actor with the most film credits (i.e. the actor vertex with the highest degree)
  1. What class variables will you need in your Bacon class?

  2. How would you implement the stats method? I just want general ideas here, such as which data structures you will use to store your information, and basic psuedocode for how to compute the various statistics.