Testing

concordance.py: 34 points

In your GitHub repository for this lab, you will find the following test files for your program:

Jabberwocky.txt [file] - Lewis Carrol’s Jabberwocky
LoveAndTheButterfly.txt [file] - Alice Moore-Dunbar Nelson’s Love and the Butterfly
Prufrock.txt [file] - T.S. Eliot’s The Love Song of J. Alfred Prufrock
Test.txt [file] - a file for testing your line numbering

Notably, Test.txt should give you the following output:

eight 8 8 8 8 8 8 8 8
five 3 5 5 5 5 5
one 1
three 3 3 3 5
I found 8 lines containing 4 unique words.

If you get a different output, there is either a problem with your line numbering or the way you are stripping punctuation. The other files are mainly useful for checking punctuation; there are many different punctuation characters used in these files and you should remove all of the leading and trailing punctuation. Look carefully at your output.

README

If you see what appears to be a blank word followed by line numbers, it may have been generated in the following way. The split() function separates a string into words by using whitespace as a delimiter, so some “words” might just be sequences of punctuation characters, such as “!!!”. When you strip off the punctuation you are left with the empty string. Before you add a word and its line number to the concordance, you should check if the word is the empty string; if it is, just don’t add it.

Extra Files

If you want to play with your concordance, here are a few additional files you might work with (you can download these, then upload them to your Codespace by dragging and dropping them into the Explorer pane on the left side of the screen):

Beowulf.txt [file] - translated to modern English by Hall
Frankenstein.txt [file] - the entire text of Mary Wollstonecraft Shelley’s novel
DavidCopperfield.txt [file] - all 626 pages of the Dickens novel
Inferno.txt [file] - the first third of Dante’s Divine Comedy, translated by Norton
KingLear.txt [file] - the Shakespeare play
Republic.txt [file] - Plato’s Republic