You can download the assignment instructions by clicking on this link
Preprocessing the Text
I have included code in both Python and Java for handling some of the preprocessing you need to do for the lab. This includes both parsing a long string of text into a list of words, as well as converting a list of words into its list of stems. Please note that you will still need to find your own list of stop words (please cite in your code and README where you found the list of stopwords) and remove stop words from the list of words before stemming.
import StemmingUtil # parse the text into a list of words words = StemmingUtil.parseTokens(lowerCaseText) # remove the stop words ''' Your code goes here ''' # convert the words to their stems stems = StemmingUtil.createStems(words)
import edu.oberlin.csci374.StemmingUtil; // parse the text into a list of words List<String> words = StemmingUtil.parseTokens(lowerCaseText); // remove the stop words /* Your code goes here */ // convert the words into their stems List<String> stems = StemmingUtil.createStems(words);