concordance.py: 34 points
In your replit project for this lab, you will find the following test files for your program:
Jabberwocky.txt[file] - Lewis Carrol’s Jabberwocky
LoveAndTheButterfly.txt[file] - Alice Moore-Dunbar Nelson’s Love and the Butterfly
Prufrock.txt[file] - T.S. Eliot’s The Love Song of J. Alfred Prufrock
Test.txt[file] - a file for testing your line numbering
Test.txt should give you the following output:
eight 8 8 8 8 8 8 8 8 five 3 5 5 5 5 5 one 1 three 3 3 3 5 I found 8 lines containing 4 unique words.
If you get a different output, there is either a problem with your line numbering or the way you are stripping punctuation. The other files are mainly useful for checking punctuation; there are many different punctuation characters used in these files and you should remove all of them. Look carefully at your output.
If you see what appears to be a blank word followed by line numbers, it may have been generated in the following way. The
split() function separates a string into words by using whitespace as a delimiter, so some “words” might just be sequences of punctuation characters, such as “!!!”. When you strip off the punctuation you are left with the empty string. Before you add a word and its line number to the concordance, you should check if the word is the empty string; if it is, just don’t add it.
If you want to play with your concordance, here are a few additional files you might work with (you can download these, then upload them to your replit project using the Files pane on the left side of the screen):
Beowulf.txt[file] - translated to modern English by Hall
Frankenstein.txt[file] - the entire text of Mary Wollstonecraft Shelley’s novel
DavidCopperfield.txt[file] - all 626 pages of the Dickens novel
Inferno.txt[file] - the first third of Dante’s Divine Comedy, translated by Norton
KingLear.txt[file] - the Shakespeare play
Republic.txt[file] - Plato’s Republic