Lab 6 Algorithm Timing

Due: Sunday, April 8, 2007 by midnight

The purpose of this lab is to see how an algorithm can be evaluated experimentally by running the algorithm on several data sets and measuring its running time. We will investigate several algorithms which solve the "maximum segment sum" problem, which can be stated as follows:

Given a sequence of integers a1, a2, ... , an, find the maximum over all pairs of integers i, j, where 1 <= i <= j <= n, of the sum ai + ... + aj.
For example, consider the sequence
2, 3, 4
The segment with the largest sum is the entire sequence, with a sum of 9.

For

1, -2, 3, -4, 5, -6, 7
the segment with the largest sum is just the last element, so the maximum sum is 7.

For

2, -5, 3, -1, 6, -6, 4, 5, 8, -4, -2, 4, -10, 12
the segment with the largest sum is { 3, -1, 6, -6, 4, 5, 8 } with a sum of 19.

Note: We will allow a segment to have length 0. By definition, a segment of length 0 (an empty segment) has a sum of 0. The empty segment is a segment of every integer sequence, so the maximum segment sum of any integer sequence can never be less than 0, even for a sequence of all negative integers.

Lab requirements

In this lab, you will do the following:
  1. Write a Java class to solve the Maximum Segment Sum problem.
  2. Write a Java class which will determine the running time of any algorithm which operates on an array of integers.
  3. Measure the performance of your solution to the Maximum Segment Sum problem on arrays of different sizes.
  4. Use the data from part 3 to determine the asymptotic behavior of the running time of the algorithm.
  5. Repeat steps 3 and 4 for four solutions that I will provide.

Starting point code

I am providing the following code for you to use with this lab, in the file lab6a.jar :

ArrayAlgorithm.java -- This is an interface that defines a single method called runAlgorithm. Defining ArrayAlgorithm as an interface will make it possible for the ArrayAlgorithmTimer (described below) to collect timing information on any array algorithm which implements the interface.

ArrayAlgorithmTimer.java (incomplete) -- This class will be used to gather timing information on any integer array algorithm. It contains a single method called computeRunningTime, which has two arguments: an ArrayAlgorithm and an integer array.

The method runs the algorithm on the array and returns its running time in milliseconds. An optional third argument may be used to specify a number of trials to be performed. If it is present, the requested number of trials are run, and the return value is the average running time of the trials.

SegmentFinder.java -- This is an abstract class which defines a solver of the Maximum Segment Sum problem. It contains an abstract method called

public abstract int maxSegSum(int[] array);
This method accepts an integer array as its argument, and returns the maximum segment sum found in the array. Making this an abstract class will allow us to test several different algorithms, all of which solve the same problem.

SegmentFinder implements the ArrayAlgorithm interface by defining its runAlgorithm method as a call to maxSegSum. This will allow us to use ArrayAlgorithmTimer to time the runs of all the different maxSegSum algorithms.

SegmentFinder[1,2,3,4].class -- These are four concrete subclasses of SegmentFinder. Each one uses a different algorithm to solve the Maximum Segment Sum problem.

DataCollector.java -- This class contains the main method of the project. The data collector performs the following steps:

  1. Instantiate an ArrayAlgorithmTimer.
  2. Instantiate a SegmentFinder.
  3. Step through a series of array sizes. For each size, the data collector creates an array and passes the SegmentFinder and the array to the ArrayAlgorithmTimer in order to perform test runs.

The main program can be run from a command prompt. It accepts up to 5 command-line arguments:

  1. An integer (0-4) designating which of the SegmentFinders to use. (default is 1)
  2. The initial value for the array size. (default is 1000)
  3. The amount by which to increment the array size from run to run. (default is 100)
  4. The number of different array sizes to test. (default is 1)
  5. The number of trials to perform on each array. (default is 1)

So, if the program is invoked by the command:

java DataCollector 2 1000 100 10 3
it will run SegmentFinder2 on array sizes 1000, 1100, 1200, ... , 1900, making three test runs on each array.

Its output is a three column table, showing the run number, the array size, and the running time in milliseconds.

Todo

Part one. Write your own implementation of SegmentFinder in a java file called SegmentFinder0.java. (Be sure that you are extending the abstract class SegmentFinder.) The simplest solution is to try every possible combination of i and j, compute the sum of the elements between array[i] and array[j], and keep a record of the largest sum. You may be able to think of a more efficient way to solve the problem.

SegmentFinder.java contains a method (called "test") which you can use to test your program. It runs the maxSegSum method on five arrays and displays the results. You can use it by writing a main method in SegmentFinder0 which instantiates a SegmentFinder0 object and calls its test method.

The correct results for the five test cases are 9, 7, 0, 3, and 19.

Part two. Complete the coding of the ArrayAlgorithmTimer class. It contains the stub of a method called computeRunningTime. It has three arguments: an ArrayAlgorithm, an array, and a number of trials. It should get the current time by calling the method System.currentTimeMillis(), which returns the number of milliseconds of elapsed time since January 1, 1970 as a long integer. (The data type of the return value is "long".)

It then applies the array algorithm to the given array the desired number of times. Then it gets the time again. The running time of the algorithm is the difference between the two time readings, divided by the number of trials; that is, the average running time per trial.

Part three. The next step is to collect timing data. Run the DataCollector program for each version of the SegmentFinder. Because the running times of the versions will vary widely, you'll need to vary the arguments to the program in order to get meaningful results for each version.

Try to get a range of about 20 array sizes for each version, with the largest array size somewhere between twice and ten times the smallest. If the program takes too long to run, try it on smaller arrays. If the running times are very small (under 10 milliseconds) or if the time does not increase as the array size increases, try it on larger arrays. If the results are inconsistent, use a larger number of trials. This should smooth out the resulting times. (If you try too large an array, you will get an "Out of Memory" error from Java. You can fix this with an option to java telling it to use more memory. E.g.,

 java -Xmx1g  DataCollector 2 1000 100 10 3
 
tells it to use up to 1 GigaByte, -Xmx200m is 200 MegaBytes, and so forth.)

Once you have found a set of arguments for each version that produces good output data, save the output from each run into a file. This can be done easily by redirecting the output of the program to a file, as follows:

java DataCollector 2 1000 100 10 3 > filename
Note: Because the running times of the different versions of SegmentFinder vary widely, you may need to use different parameters for each version.

Part four. In order to analyze the data obtained in part three, we will use the Microsoft Excel spreadsheet program. (Or see these notes on using Open Office.) The output files, which are in ASCII text format, can be imported into an Excel spreadsheet. You can start up Excel from the Start menu. For each data file,

1. Open it by choosing the open command from the file menu. This will bring up the Text Import Wizard. Tell the wizard that the data is delimited by spaces. It should find three columns, each of which is in General format. When the Wizard is finished, the spreadsheet should look something like this:

2. Now, use the Chart Wizard to create a graph of the results. Select column C, which has the running times in milliseconds. Click on the Chart Wizard icon. You can select a chart type; an X-Y Scatter with connecting lines and individual data points marked will probably work best. Your chart should look something like this:

3. Next, try to fit a curve to the data points in your graph. You can try out a curve by inserting its values in column D of your spreadsheet. For example, for the formula

running time = .001 * (array size)2,
enter "=.001*B1*B1" in cell D1. Then select the cells in column D, from cell D1 down to the end of your data. Then choose "fill down" from the Edit menu. The formula you entered should be applied to each row of the table.

You can add column D to the chart representation of the data as follows:

  1. Go to the chart view.
  2. Right-click the mouse within the chart area.
  3. Select "Chart Data".
  4. Add a new series containing the data from column D.
The result should be something like this:

Note: Looking at the source code provided in part five may help you decide what type of curves to use to try to fit the data.

Open Office

With just a few changes, you can use Open Office on the Linux machines to perform these tests.

  1. Open "Applications" -> "Office" -> "Office Calc" to start the program
  2. "Insert" -> "Sheet from file" to import the data (be sure to select "space" and "merge delimiters")
  3. The chart wizard button didn't work for me, but you can use "Insert" -> "Chart"

Part five. Now, try to match the output from the five versions of SegmentFinder to the source code. In the jar file lab6b.jar , you'll find the source files for the four implementations I provided, identified as SegmentFinderA.java, SegmentFinderB.java, SegmentFinderC.java, and SegmentFinderD.java.

For each one, study the source code and determine analytically the asymptotic running time (big-oh) of the program for an array of size n. By comparing these results to the experimental results, you should be able to determine which of the source files corresponds to each of the class files (SegmentFinder1, SegmentFinder2, SegmentFinder3, and SegmentFinder4).

You might find that Google's Calculator functionality is useful to convert units, etc. (e.g., 123*10^45 ms in years).

Submission

When you are finished, hand in a directory containing:

  1. The source and class files (just SegmentFinder[1-4].class) for the project.
  2. The spreadsheets and charts you created using Excel.
  3. A file called "readme.txt" which contains
    1. The running time formulas you derived from the experimental data in part four.
    2. The asymptotic running time results (in big-oh notation) you derived analytically in part five.
    3. Your guesses as to which of the four SegmentFinder class files were compiled from the four SegmentFinder source files.
    4. An estimate of the running time of each of the five programs on an input size of 1,000,000.