CSCI 333 -- Natural Language Processing
Project Guidelines
Fall, 2011


due dates:
One- to two-page proposal due:  Monday, November 14
Class presentations:  December 7th and 9th
Written report due:  December 12

Your assignment is to create a project that involves the design and implementation of a major program or programs that is an example of a natural language processing system.  You have a fair amount of flexibility in what you do for your project, but you will need to do some research to find something appropriate.

Some suitable types of project:
The best projects are usually the ones that you think up (and care about!) yourselves.  Choose an area of NLP that interests you and that you would like to explore in depth.  A list of projects suggested by others can be found starting in http://www.stanford.edu/class/cs224n/handouts/cs224n-fp.pdf.

Feel free to build on existing code, but be sure to give credit when you use the work of others.

As with the homework, I encourage you to work together in pairs on the project as you will be able to accomplish more and several eyes on code tend to result in better quality code and fewer errors.  If you do work in a pair (and I hope you do), I ask you to include with your final write up a description of who did what.  My intention is to give all members of a team the same grade, but will provide individual grades if there is evidence that some people did considerably more than others.

Your final project should not include work done for another course unless you have permission from both me and the other instructor.


Proposal

Your one- to two-page proposal should address the following issues:
  1. What is the problem that you will be attacking?
  2. Why is this interesting as an NLP problem?
  3. What are relevant references to existing approaches to this problem?
  4. What technical methods or approaches will you use?
  5. On what data will you run your system?
  6. How will you evaluate the performance of your system?
The section on evaluation is important.  I will expect you to accumulate data on how your program works and use at least some statistical analysis to back up your conclusions.

Lots of data is available on the web.  We've already used the Penn Treebank and the Brown corpus in NLTK.  There are many other sites with data available.  You will likely find the info at http://nlp.stanford.edu/links/statnlp.html helpful.

I am happy to discuss your proposal with you in advance.  Once you have submitted your proposal and I have approved it, any changes in direction and scope will need to be approved by me.

The scale of a project is always difficult to specify.  A reasonable target is to assume that each participant will put in about as much effort as on two regular homework sets plus a bit more on the write-up.

In case of a group project, I expect only one proposal to be turned in.


Class presentation

Each group will give a formal 20 minute report on their project, followed by a question-and-answer period.  A successful presentation will discuss the prior work in the area, what you have accomplished (with a demo), and an evaluation of how successful the project was.  Typically this will involve gathering data and comparing your results with other techniques for solving similar problems.


Final report

The final report should include the code you develop as an appendix to the report.  The code should also be turned in electronically along with instructions on how to use it.

The final report will generally be about ten pages long, including a section on prior work, your approach to the problem, analysis of your data, and conclusions, which likely will include suggestions for further work (e.g., suggested modifications to your approach).  You may find it useful to look at some of the research papers at http://aclweb.org/anthology-new/, particularly those from the CoLing conference, as examples.