concordance.py: 34 points
In your replit project for this lab, you will find the following test files for your program:
Jabberwocky.txt
[file] - Lewis Carrol’s JabberwockyLoveAndTheButterfly.txt
[file] - Alice Moore-Dunbar Nelson’s Love and the ButterflyPrufrock.txt
[file] - T.S. Eliot’s The Love Song of J. Alfred PrufrockTest.txt
[file] - a file for testing your line numberingNotably, Test.txt
should give you the following output:
eight 8 8 8 8 8 8 8 8
five 3 5 5 5 5 5
one 1
three 3 3 3 5
I found 8 lines containing 4 unique words.
If you get a different output, there is either a problem with your line numbering or the way you are stripping punctuation. The other files are mainly useful for checking punctuation; there are many different punctuation characters used in these files and you should remove all of them. Look carefully at your output.
ReadMe
If you see what appears to be a blank word followed by line numbers, it may have been generated in the following way. The split()
function separates a string into words by using whitespace as a delimiter, so some “words” might just be sequences of punctuation characters, such as “!!!”. When you strip off the punctuation you are left with the empty string. Before you add a word and its line number to the concordance, you should check if the word is the empty string; if it is, just don’t add it.
Extra
If you want to play with your concordance, here are a few additional files you might work with (you can download these, then upload them to your replit project using the Files pane on the left side of the screen):
Beowulf.txt
[file] - translated to modern English by HallFrankenstein.txt
[file] - the entire text of Mary Wollstonecraft Shelley’s novelDavidCopperfield.txt
[file] - all 626 pages of the Dickens novelInferno.txt
[file] - the first third of Dante’s Divine Comedy, translated by NortonKingLear.txt
[file] - the Shakespeare playRepublic.txt
[file] - Plato’s Republic