Lab 2

It's all about Timing. And Lobsters.
Due by 8pm, Sunday 23 Sep 2012

In this lab you will experiment with algorithm timing and asymptotic analysis. The purpose of this lab is to:


Motivation

On the east coast, a little ways up north, there is a lobster-fishing community trying to maximize their lobster harvest from a straight-line waterway. As we all know, lobster sticks to magnet. Consequently, the town has fashioned a lobster harvester out of an airplane, some rope, and a very large magnet. If the magnet-plane flies low enough to the waterway, the lobsters in that segment of the waterway will stick to the magnet and be harvested. The problem is that not all segments of waterway are made equal, and not all things that stick to magnets are lobsters (for example, tin cans and robot lobsters); that is, some segments have an overall negative effect on the harvest, and some have a positive effect. Moreover, the plane cannot go up and down willy-nilly, harvesting only the positive segments. Rather, the plane can only descend and ascend exactly once, and harvest precisely those segments of the waterway lying between the points of descent and ascent. You have been commisioned by said lobster-fishing community to determine where their plane should ascend and descend so as to maximize the quality of their harvest.

Diagram courtesy of Tom Wexler

More concretely, you are given a list of n integer values a1, a2, ..., an (positive or negative) representing the quality of the harvest at each of the n segments (from west to east, say). Your task, should you choose to accept it, is to find the maximum additive quality in any contiguous sublist of the n segments. That is, you want to find the maximum over all pairs of integers i and j where 1 ≤ i ≤ j ≤ n, of the sum ai + ai+1 + ... + aj.

For example, consider the sequence
2, 3, 4.
The contiguous subsequence with the largest sum is the entire sequence, with a sum of 9.

For

1, -2, 3, -4, 5, -6, 7,
the contiguous subsequence with the largest sum is just the last element, so the maximum sum is 7.

For

2, -5, 3, -1, 6, -6, 4, 5, 8, -4, -2, 4, -10, 12,
the contiguous subsequence with the largest sum is { 3, -1, 6, -6, 4, 5, 8 } with a sum of 19.

Note: We will allow a subsequence to have length 0. By definition, a subsequence of length 0 (an empty subsequence) has a sum of 0. The empty subsequence is a subsequence of every integer sequence, so the maximum contiguous subsequence sum of any integer sequence can never be less than 0, even for a sequence of all negative integers.

Okay, yes, this whole lobster rigamarole is just the "maximum contiguous subsequence sum" problem, described at great length in Chapter 5 of Weiss. But everything is better with lobsters involved!


Lab requirements

The purpose of this lab is to see how an algorithm can be evaluated experimentally by running the algorithm on several data sets and measuring its running time. We will investigate several algorithms which solve the above defined maximum contiguous subsequence sum problem.

In this lab, you will do the following:
  1. Write a Java class to solve the maximum contiguous subsequence sum problem.
  2. Write a Java class which will determine the running time of any algorithm which operates on an array of integers.
  3. Measure the performance of your solution to the maximum contiguous subsequence sum problem on arrays of different sizes.
  4. Use the data from part 3 to determine the asymptotic behavior of the running time of the algorithm.
  5. Repeat steps 3 and 4 for four solutions that we will provide.

Starting point code

First download the file lab2a.jar. Then, unpack the file into your lab02 directory using

    % jar xvf lab2a.jar

Then, open up eclipse, and create a new project from an existing source (refresher: we did this in lab 0). You should be able to see a bunch of java and class files listed under the new project, in the left pane, once you click finish.

Now, this next step is important for eclipse to work with the class files listed: right-click on the lab2a.jar file in the left pane, and select Build Path > Add to Build Path. This will guarantee that your java files can "see" the provided class files. (<-- do this)

Now let's talk about those java files, one by one:

ArrayAlgorithm.java -- This is an interface that defines a single method called runAlgorithm. Defining ArrayAlgorithm as an interface will make it possible for the ArrayAlgorithmTimer (described below) to collect timing information on any array algorithm which implements the interface.

ArrayAlgorithmTimer.java (incomplete) -- This class will be used to gather timing information on any integer array algorithm. It contains two methods called computeRunningTime, one of which has two arguments: an ArrayAlgorithm and an integer array. The method runs the provided algorithm on the provided array and returns its running time in milliseconds. An optional third argument may be used to specify a number of trials to be performed. If it is present, the requested number of trials are run, and the return value is the average running time of the trials.

MaxLobster.java -- This is an abstract class which defines a solver of the maximum contiguous subsequence sum problem. It contains an abstract method called

public abstract int maxSeqSum(int[] array);

This method accepts an integer array as its argument, and returns the maximum contiguous subsequence sum found in the array. Making this an abstract class will allow us to test several different algorithms, all of which solve the same problem.

MaxLobster implements the ArrayAlgorithm interface by defining its runAlgorithm method as a call to maxSeqSum. This will allow us to use ArrayAlgorithmTimer to time the runs of all the different maxSeqSum algorithms.

MaxLobster0Test.java -- This is a JUnit test for the MaxLobster0 class that you will create. It shouldn't compile as it stands, since there is no MaxLobster0 class yet. You may have to add junit4 to your classpath to get it to be recognized.

MaxLobster[1,2,3,4].class -- These are four concrete subclasses of MaxLobster. Each one uses a different algorithm to solve the maximum contiguous subsequence sum problem. If you don't see these listed in your pane, don't sweat it.

DataCollector.java -- This class contains the main method of the project. The data collector performs the following steps:

  1. Instantiate an ArrayAlgorithmTimer.
  2. Instantiate a MaxLobster.
  3. Step through a series of array sizes. For each size, the data collector creates an array and passes the MaxLobster and the array to the ArrayAlgorithmTimer in order to perform test runs.

The main program can be run from a command prompt, provided your class files are all in the same directory, or using Run > Run Configurations and listing the arguments described below. It accepts up to 5 command-line arguments:

  1. An integer (0-4) designating which of the MaxLobster to use. (default is 1)
  2. The initial value for the array size. (default is 1000)
  3. The amount by which to increment the array size from run to run. (default is 100)
  4. The number of different array sizes to test. (default is 1)
  5. The number of trials to perform on each array. (default is 1)

So, if the program is invoked by the command:

java DataCollector 2 1000 100 10 3

it will run MaxLobster2 on array sizes 1000, 1100, 1200, ... , 1900, making three test runs on each array.

Its output is a three column table, showing the run number, the array size, and the running time in milliseconds.

Try running the program now, to make sure that everything is working. You may run into trouble depending on the location of your class files, and you should get this working before proceeding.

If eclipse has removed the MaxLobster class files from your directory (possibly because you set your output directory to the current directory), AND if you are trying to run the program from the command line, you will most likely need to try this:

    % java -cp .:lab2a.jar DataCollector 2 1000 100 10 3
    

The -cp (and following argument) tells java where to find the appropriate classfiles... you need the "." to look in the current directory (for DataCollector, for example), and you need "lab2a.jar" to find the MaxLobster class files.


//Todo

Part one. Write your own implementation of MaxLobster in a java file called MaxLobster0.java (your class should extend MaxLobster). The simplest solution is to try every possible combination of i and j, compute the sum of the elements between array[i] and array[j], and keep a record of the largest sum. You may be able to think of a more efficient way to solve the problem.

Once you have this programmed up, you should test your program with the provided JUnit class (MaxLobster0Test.java). You can go back to lab 1 to refresh your memory on JUnit, but the quick story is that you right-click on MaxLobster0Test.java (in the left pane), and select Run As > JUnit Test. Fix any problems you have before proceeding.

Part two. Complete the coding of the ArrayAlgorithmTimer class. It contains the stub of a method called computeRunningTime. It has three arguments: an ArrayAlgorithm, an array, and a number of trials. It should get the current time by calling the method System.currentTimeMillis(), which returns the number of milliseconds of elapsed time since January 1, 1970 as a long integer. (The data type of the return value is "long".)

It should then apply the given array algorithm to the given array the desired number of times. Then it should get the time again. The running time of the algorithm is the difference between the two time readings, divided by the number of trials; that is, the average running time per trial.

Part three. The next step is to collect timing data. Run the DataCollector program for each version of the MaxLobster. Because the running times of the versions will vary widely, you'll need to vary the arguments to the program in order to get meaningful results for each version.

Try to get a range of about 20 array sizes for each version, with the largest array size somewhere between twice and ten times the smallest. If the program takes too long to run, try it on smaller arrays. If the running times are very small (under 10 milliseconds) or if the time does not increase as the array size increases, try it on larger arrays. If the results are inconsistent, use a larger number of trials. This should smooth out the resulting times. (If you try too large an array, you will get an "Out of Memory" error from Java. Remember you can increase the amount of memory by using the command line flag -Xmx followed by the amount of memory to use like "-Xmx342m" to use 342 MB of memory.)

Once you have found a set of arguments for each version that produces good output data, note the arguments you used and save the output from each run into a file. This can be done easily by redirecting the output of the program to a file, as follows:

java DataCollector 2 1000 100 10 3 > filename

(If you can't get the redirecting working in eclipse, you can run it from the command line in your terminal window. If the MaxLobster classes aren't already in the same directory as your DataCollector class file, you should copy them over in order for the program to run without errors.)

Part four. In order to analyze the data obtained in part three, we will use the Microsoft Excel spreadsheet program, or, if you're on the lab machines, Open Office. (See these notes on using Open Office.) The output files, which are in ASCII text format, can be imported into an Excel spreadsheet. You can start up Excel from the Start menu. For each data file,

  1. Open it by choosing the open command from the file menu. This will bring up the Text Import Wizard. Tell the wizard that the data is delimited by spaces (and to merge delimiters if it is an option). It should find three columns, each of which is in General format. When the Wizard is finished, the spreadsheet should look something like this:

  2. Now, use the Chart Wizard to create a graph of the results. Select column C, which has the running times in milliseconds. Click on the Chart Wizard icon (or Insert > Chart in Open Office). You can select a chart type; a line chart with individual data points marked will probably work best. Your chart should look something like this:

  3. Next, try to fit a curve to the data points in your graph. You can try out a curve by inserting its values in column D of your spreadsheet. For example, for the formula
    running time = .001 * (array size)2,
    enter =.001*B1*B1 in cell D1. Then select the cells in column D, from cell D1 down to the end of your data. Then choose "fill down" from the Edit menu. The formula you entered should now be applied to each row of the table.

    Actually, it's a little easier to fit the data if you start with a row that is more in the "middle". For example, instead of inserting the formula above into cell D1 and filling down, you may get better results by inserting =0.001*B8*B8 into cell D8, then filling down and filling up.

    You can (and should) add column D to the chart representation of the data as follows:

    1. Go to the chart view (i.e. go to where you can see your chart).
    2. Right-click the mouse within the chart area.
    3. Select "Chart Data" (in OpenOffice, select Data Ranges > Data Series).
    4. Add a new series containing the data from column D (in OpenOffice, click Add > Unnamed Series and under Y-Values, click the little box to right of the textbox, and select column D from cell D1 to the end of your data.

    The result should be something like this (although perhaps not quite so awesomely matched):

    Note: Looking at the source code provided in part five may help you decide what type of curves to use to try to fit the data.

    Some students have used Excel's curve-fitting functionality to come up with an equation. This answer is not what you want.

Open Office

With just a few changes, you can use Open Office on the Linux machines to perform these tests.

  1. Open Applications >  Office  >  Office Calc to start the program
  2. Insert  >  Sheet from file to import the data (be sure to select "space" and "merge delimiters")
  3. The chart wizard button didn't work for me, but you can use Insert  >  Chart when it comes time to make charts.

Part five. Now, try to match the output from the five versions of MaxLobster to the source code. In the jar file lab2b.jar, you'll find the source files for the four implementations I provided, identified as MaxLobsterA.java, MaxLobsterB.java, MaxLobsterC.java, and MaxLobsterD.java.

For each one, study the source code and determine analytically the asymptotic running time (big-oh) of the program for an array of size n. By comparing these results to the experimental results, you should be able to determine which of the source files corresponds to each of the class files (MaxLobster1, MaxLobster2, MaxLobster3, and MaxLobster4). You may have trouble distinguishing between the O(n) and O(n log n) solution; just try your best!


handin

When you are finished, hand in a directory containing:

  1. The source and class files for the project.
  2. The spreadsheets and charts you created using Excel or Open Office.
  3. A file called "README" which contains
    1. The command line arguments you used for each run in part 3.
    2. The running time formulas you derived from the experimental data in part four.
    3. The asymptotic running time results (in big-oh notation) you derived analytically in part five.
    4. Your guesses as to which of the four MaxLobster class files were compiled from the four MaxLobster source files.
    5. An estimate of the running time of each of the five programs on an input size of 1,000,000.