Due: Sunday, April 8, 2007 by midnight
The purpose of this lab is to see how an algorithm can be evaluated experimentally by running the algorithm on several data sets and measuring its running time. We will investigate several algorithms which solve the "maximum segment sum" problem, which can be stated as follows:
Given a sequence of integers a1, a2, ... , an, find the maximum over all pairs of integers i, j, where 1 <= i <= j <= n, of the sum ai + ... + aj.For example, consider the sequence
2, 3, 4The segment with the largest sum is the entire sequence, with a sum of 9.
For
1, -2, 3, -4, 5, -6, 7the segment with the largest sum is just the last element, so the maximum sum is 7.
For
2, -5, 3, -1, 6, -6, 4, 5, 8, -4, -2, 4, -10, 12the segment with the largest sum is { 3, -1, 6, -6, 4, 5, 8 } with a sum of 19.
Note: We will allow a segment to have length 0. By definition, a segment of length 0 (an empty segment) has a sum of 0. The empty segment is a segment of every integer sequence, so the maximum segment sum of any integer sequence can never be less than 0, even for a sequence of all negative integers.
I am providing the following code for you to use with this lab, in the file lab6a.jar :
ArrayAlgorithm.java -- This is an interface that defines a single method called runAlgorithm. Defining ArrayAlgorithm as an interface will make it possible for the ArrayAlgorithmTimer (described below) to collect timing information on any array algorithm which implements the interface.
ArrayAlgorithmTimer.java (incomplete) -- This class will be used to gather timing information on any integer array algorithm. It contains a single method called computeRunningTime, which has two arguments: an ArrayAlgorithm and an integer array.
The method runs the algorithm on the array and returns its running time in milliseconds. An optional third argument may be used to specify a number of trials to be performed. If it is present, the requested number of trials are run, and the return value is the average running time of the trials.
SegmentFinder.java -- This is an abstract class which defines a solver of the Maximum Segment Sum problem. It contains an abstract method called
public abstract int maxSegSum(int[] array);This method accepts an integer array as its argument, and returns the maximum segment sum found in the array. Making this an abstract class will allow us to test several different algorithms, all of which solve the same problem.
SegmentFinder implements the ArrayAlgorithm interface by defining its runAlgorithm method as a call to maxSegSum. This will allow us to use ArrayAlgorithmTimer to time the runs of all the different maxSegSum algorithms.
SegmentFinder[1,2,3,4].class -- These are four concrete subclasses of SegmentFinder. Each one uses a different algorithm to solve the Maximum Segment Sum problem.
DataCollector.java -- This class contains the main method of the project. The data collector performs the following steps:
The main program can be run from a command prompt. It accepts up to 5 command-line arguments:
So, if the program is invoked by the command:
java DataCollector 2 1000 100 10 3it will run SegmentFinder2 on array sizes 1000, 1100, 1200, ... , 1900, making three test runs on each array.
Its output is a three column table, showing the run number, the array size, and the running time in milliseconds.
Part one. Write your own implementation of SegmentFinder in a java file called SegmentFinder0.java. (Be sure that you are extending the abstract class SegmentFinder.) The simplest solution is to try every possible combination of i and j, compute the sum of the elements between array[i] and array[j], and keep a record of the largest sum. You may be able to think of a more efficient way to solve the problem.
SegmentFinder.java contains a method (called "test") which you can use to test your program. It runs the maxSegSum method on five arrays and displays the results. You can use it by writing a main method in SegmentFinder0 which instantiates a SegmentFinder0 object and calls its test method.
The correct results for the five test cases are 9, 7, 0, 3, and 19.
Part two. Complete the coding of the ArrayAlgorithmTimer class. It contains the stub of a method called computeRunningTime. It has three arguments: an ArrayAlgorithm, an array, and a number of trials. It should get the current time by calling the method System.currentTimeMillis(), which returns the number of milliseconds of elapsed time since January 1, 1970 as a long integer. (The data type of the return value is "long".)
It then applies the array algorithm to the given array the desired number of times. Then it gets the time again. The running time of the algorithm is the difference between the two time readings, divided by the number of trials; that is, the average running time per trial.
Part three. The next step is to collect timing data. Run the DataCollector program for each version of the SegmentFinder. Because the running times of the versions will vary widely, you'll need to vary the arguments to the program in order to get meaningful results for each version.
Try to get a range of about 20 array sizes for each version, with the largest array size somewhere between twice and ten times the smallest. If the program takes too long to run, try it on smaller arrays. If the running times are very small (under 10 milliseconds) or if the time does not increase as the array size increases, try it on larger arrays. If the results are inconsistent, use a larger number of trials. This should smooth out the resulting times. (If you try too large an array, you will get an "Out of Memory" error from Java. You can fix this with an option to java telling it to use more memory. E.g.,
java -Xmx1g DataCollector 2 1000 100 10 3tells it to use up to 1 GigaByte, -Xmx200m is 200 MegaBytes, and so forth.)
Once you have found a set of arguments for each version that produces
good output data, save the output from each run into a file. This
can be done easily by redirecting the output of the program to
a file, as follows:
java DataCollector 2 1000 100 10 3 > filenameNote: Because the running times of the different versions of SegmentFinder vary widely, you may need to use different parameters for each version.
Part four. In order to analyze the data obtained in part three, we will use the Microsoft Excel spreadsheet program. (Or see these notes on using Open Office.) The output files, which are in ASCII text format, can be imported into an Excel spreadsheet. You can start up Excel from the Start menu. For each data file,
1. Open it by choosing the open command from the file menu. This will bring up the Text Import Wizard. Tell the wizard that the data is delimited by spaces. It should find three columns, each of which is in General format. When the Wizard is finished, the spreadsheet should look something like this:
2. Now, use the Chart Wizard to create a graph of the results. Select column C, which has the running times in milliseconds. Click on the Chart Wizard icon. You can select a chart type; an X-Y Scatter with connecting lines and individual data points marked will probably work best. Your chart should look something like this:
3. Next, try to fit a curve to the data points in your graph. You can try out a curve by inserting its values in column D of your spreadsheet. For example, for the formula
running time = .001 * (array size)2,enter "=.001*B1*B1" in cell D1. Then select the cells in column D, from cell D1 down to the end of your data. Then choose "fill down" from the Edit menu. The formula you entered should be applied to each row of the table.
You can add column D to the chart representation of the data as follows:
Note: Looking at the source code provided in part five may help
you decide what type of curves to use to try to fit the data.
With just a few changes, you can use Open Office on the Linux machines to perform these tests.
Part five. Now, try to match the output from the five versions of SegmentFinder to the source code. In the jar file lab6b.jar , you'll find the source files for the four implementations I provided, identified as SegmentFinderA.java, SegmentFinderB.java, SegmentFinderC.java, and SegmentFinderD.java.
For each one, study the source code and determine analytically the asymptotic running time (big-oh) of the program for an array of size n. By comparing these results to the experimental results, you should be able to determine which of the source files corresponds to each of the class files (SegmentFinder1, SegmentFinder2, SegmentFinder3, and SegmentFinder4).
You might find that Google's Calculator functionality is useful to convert units, etc. (e.g., 123*10^45 ms in years).
When you are finished, hand in a directory containing: