CSCI 333 -- Natural
Language Processing
Project Guidelines
Fall, 2011
due dates:
One- to two-page proposal due: Monday, November 14
Class presentations: December 7th and 9th
Written report due: December 12
Your assignment is to create a project that involves the design and
implementation of a major program or programs that is an example of
a natural language processing system. You have a fair amount
of flexibility in what you do for your project, but you will need to
do some research to find something appropriate.
Some suitable types of project:
- Develop a significant piece of code that builds on current
knowledge in NLP and goes beyond what we have discussed in
class.
- Do a careful comparative study of the different NLP techniques
to solve a problem, presenting computational evidence that will
help determine which techniques are the most helpful.
- Apply NLP techniques to solving an applied problem.
The best projects are usually the ones that you think up (and care
about!) yourselves. Choose an area of NLP that interests you
and that you would like to explore in depth. A list of
projects suggested by others can be found starting in
http://www.stanford.edu/class/cs224n/handouts/cs224n-fp.pdf.
Feel free to build on existing code, but be sure to give credit when
you use the work of others.
As with the homework, I encourage you to work together in pairs on
the project as you will be able to accomplish more and several eyes
on code tend to result in better quality code and fewer
errors. If you do work in a pair (and I hope you do), I ask
you to include with your final write up a description of who did
what. My intention is to give all members of a team the same
grade, but will provide individual grades if there is evidence that
some people did considerably more than others.
Your final project should not include work done for another course
unless you have permission from both me and the other instructor.
Proposal
Your one- to two-page proposal should address the following issues:
- What is the problem that you will be attacking?
- Why is this interesting as an NLP problem?
- What are relevant references to existing approaches to this
problem?
- What technical methods or approaches will you use?
- On what data will you run your system?
- How will you evaluate the performance of your system?
The section on evaluation is important. I will expect you to
accumulate data on how your program works and use at least some
statistical analysis to back up your conclusions.
Lots of data is available on the web. We've already used the
Penn Treebank and the Brown corpus in NLTK. There are many
other sites with data available. You will likely find the info
at http://nlp.stanford.edu/links/statnlp.html helpful.
I am happy to discuss your proposal with you in advance. Once
you have submitted your proposal and I have approved it, any changes
in direction and scope will need to be approved by me.
The scale of a project is always difficult to specify. A
reasonable target is to assume that each participant will put in
about as much effort as on two regular homework sets plus a bit more
on the write-up.
In case of a group project, I expect only one proposal to be turned
in.
Class presentation
Each group will give a formal 20 minute report on their project,
followed by a question-and-answer period. A successful
presentation will discuss the prior work in the area, what you have
accomplished (with a demo), and an evaluation of how successful the
project was. Typically this will involve gathering data and
comparing your results with other techniques for solving similar
problems.
Final report
The final report should include the code you develop as an appendix
to the report. The code should also be turned in
electronically along with instructions on how to use it.
The final report will generally be about ten pages long, including a
section on prior work, your approach to the problem, analysis of
your data, and conclusions, which likely will include suggestions
for further work (e.g., suggested modifications to your
approach). You may find it useful to look at some of the
research papers at http://aclweb.org/anthology-new/, particularly
those from the CoLing conference, as examples.