Due: Sunday, April 29, 2007 by midnight
In today's lab, you'll be implementing a portion of the Set interface using the Trie data structure. You can refer to your class notes or chapter 12.3 in your textbook for more discussion on them.
The Set interface is a collection that contains no duplicate elements. That is, for 2 nodes e1 and e2, you will never have e1.equals(e2) return true. (Similarly, e1.compareTo(e2) will never be 0.)
Define the data members of the Trie as follows:
boolean isWord;
Trie[] children;
Optionally, you may want to store information on the size and height of this (sub-)trie in each node. This will require additional bookkeeping during add() operations, but can simplify other methods.
int height;
int size;
You should create a no-argument constructor for a Trie which creates and initializes an empty Trie. The result is a single Trie node with the isWord flag set to false and the array of children all set to null.
The size of the children array is determined by the number of characters in the alphabet -- in this case 26. You should create a static final variable to hold this constant value, eliminating the need to hard-code "26" elsewhere in the program.
Create the following public methods in Trie.java:
Many of these methods will be implemented using recursive tree traversal techniques.
When traversing the Trie, it will be useful for you to be able to transform a char into an int that goes from 0 to 25 (the indices of children). So a => 0, b => 0, ..., z => 0. Fortunately, this can be done straightforwardly using character mathematics.
Assuming you transform all of your strings to lower case, you can simply subtract out the value of 'a' from the character under consideration to get a number you can use as the index.
To test your Trie class, write the following application programs. I've got two test data files for you to use: small.txt and lexicon.txt.
This program takes a series of words specified as command-line arguments. For each, it inserts the word into a Trie and then outputs:
This program takes a file name as its first command-line argument. The file contains one word per line, just like the above examples. You should then just use a Scanner to read in the data into a Trie.
Then the program takes the remaining command-line arguments and searches for them in the Trie one at a time. It displays the words that are found one per line.
Don't forget that some methods will require you to handle checked exceptions.
This program takes 2 command-line arguments: a file name and a prefix string. It starts by loading all the words from the file into a Trie as in the previous program. It then searches the Trie for all words which start with the given prefix string. It should then display:
A list of all matching words, displaying the entire word, not just the remainder of the string.
It is possible that the number of matches is very long (e.g., a prefix of ""). Therefore, if there are more than 20 matching strings output:
It might be interesting to see how many nodes are used to construct a given Trie. Create a program that examines this by:
Test it out on the lexicon.txt file.