Lab 1: Generalized Approximate String Comparison
in
Scheme & Java

Due March 14th (final date)

You will implement a system which given two strings A and B. Gives the minimum "distance" from A to B. Here distance is defined as how many transformations must be applied to A in order to arrive at B and their weights. We will use a very inefficient but simple algorithm of my own devising.

As you go through the lab think about, how what you are writing in Scheme and Java is both the same and different. Also consider how your program would change if you had church style booleans and lists as we discussed in class.

Ground Rules:

Implement the following in both Scheme and Java. Absolutely no loops! Keep conditionals to a minimum. Unfortunately there is no way to do this in Scheme without church booleans and lists, but I decided not to torture you with that. Use pure functional for the Scheme version i.e. no functions ending in ! Also, if you need something done consider passing in a function to do it. Your final code should run in Dr. Scheme. In Java use your best object oriented style. You may mutate. Strategy, Command and Visitor patterns may be helpful. Remember objects only know about themselves, delegation is your friend. Also, most of the Java methods don't should more than one line (except for those anonymous inner classes are easier to read if they are written on more than one line)..

The Basic Idea:

Your program should comprise a function or visitor (as appropriate) which operates on two lists of characters. Which is how the strings will be represented. It should return a floating point number (double) representing the distance. When the two strings are equal they have a distance of 0 from each other. When it is impossible to transform one to the other, their distance is infinity. Floating point numbers are used to allow us to directly represent and manipulate infinity. It greatly simplifies the solution. The minimum distance is found by starting at the beginning of the string and applying each applicable transformation to the beginning of the string. Each transformation performs its intended meaning on the strings. Then it recurs back into the comparison to try and match up the rest of the string.

Hints:

For those of you used to Scheme it may be easier to write that version first because if you need to change something there will be less rewriting. Apply and Map may be helpful.

Instructions:
  1. Think about the transformations and create any abstract code necessary. What information will the transformations need? Remember they are going to make the recursive call to compare the rests of the strings. What will they return? If a transformation isn't applicable then isn't that the same as saying that the target string can't be reached through this transformation? How will we represent that?

    SchemeJava
    Nothing to do but think. You can use CharList. Download the charlist package at http://cs.oberlin.edu/~jwalker/exco/lab1/charlist.zip
  2. Create the function/visitor to actually do the comparison.

    SchemeJava
    See if you can write a way to create different compare functions each of which knows about and uses a different list of transformations.
    It should take a source and target string represented as lists of characters.
    The actual body of the compare can be done in 1 line.
    Infinity is written in Dr. Scheme as +inf.0
    Create your visitor so that when one is instantiated it takes and holds a collection of transformations.
    Infinity in Java is Double.POSITIVE_INFINITY.
  3. Create a test program to allow you to enter strings and see the result. You must convert from the language's native string representation to our character list representation and invoke the comparison function. You may eventually want to be able to control which transformations are used.

    SchemeJava
    Call your function string-compare and have it take two strings source and target (in that order).
    Use string->list to convert strings to lists.
    An example of someone interacting with your function might be:
    > (string-compare "foo" "foo")
    0.0
    Make a GUI or command line interface to let you enter two strings.
    Convert each strings into a ACharList.
  4. Make sure it works. With proper stub coding your code should compile and run. Of course your compare has no transformations yet so it should always return infinity.

  5. Write the standard transformation. This is the identity function. That is to say, it allows us to check for exact string matching. When the strings are both empty they are equal. If the strings both have the same first letter then whether they match is just whether the rests match.

  6. Confirm that when you use your comparison with the identity transformation it behaves correctly. This is returning 0 when the strings are identical and infinity otherwise.

  7. Implement the two basic transformations of editing text. When one of these transformations is used it increases the distance between the strings by 1.


  8. Check that your code works as expected. If it doesn't fix it or figure out why you didn't have the right expectations. Be careful to not put in strings that are too long or you will be waiting all day. Note that with this set of transformations order of source and target don't matter. However, if we only allow identity and insert then order does matter.

  9. With the above some strings' distances aren't what we might intuitively guess. For example dog and log have a distance of 2 even though they differ by only one letter. This is because the d must first be deleted then the l inserted. Add a substitution transformation to make it behave the way you would expect.

  10. Try adding one of the following transformations and see what changes.

Hand in source. If you do it in JBuilder hand in your project.

Method of hand in:
Change the directory to your user name or otherwise mark it as your lab.
Compress it and email it to me at jwalker@cs.oberlin.edu.


jwalker@cs.oberlin.edu