Natural Language Processing

Computer Science 333

Fall, 2016

Course Objectives

Grading Procedures

Your grade will be based on homework, a project, and two exams.
Point breakdown (tentative):
Homework 150
Midterm Exam (October 10) 100
Final Exam (December 17 - 9 am) 120
Total   470


Late labs are strongly discouraged. You may hand up to two labs one day late without penalty.  Be sure to submit early!  Labs that are up 24 hours late will be penalized by 50%.  Labs that are more than 24 hours late will not be graded.

Problem sets are due at the beginning of lecture.  Late problem sets are not accepted.

If for some reason (such as a severe illness) you will not be able to complete a lab or take a test, talk to me immediately, and prior to the deadline.  I will handle these situations on a case-by-case basis.


Tutors are available, provided by Oberlin College. If interested, see Kay Knight in Peters 114.

Student Disabilities

If you have a disability that might impact your performance in this course, or requires special accommodation, please contact me as soon as possible so that appropriate arrangements can be made.  Support is available through Student Academic Services, specifically Jane Boomer. You will need to contact them to get your disability documented before accommodations can be made.

Academic integrity

All work in this course is to be performed in accordance with the college's honor system.  You must write the Honor Pledge and sign at the end of each and every submission.  Electronic submissions must include the honor pledge in the comments and your name.  The pledge is "I have adhered to the Honor Code in this assignment."

That being said, in a hands-on course such as this one, some discussion of lab assignments is expected and encouraged.  A few specific do's and don't's:

In the end, the work you submit must be your own.  If you're not sure what is acceptable in a given situation, please ask me about it.

Course outline

  1. Regular expressions and finite state automata.
  2. Morphology.
  3. Language models. N-grams.
  4. Part of speech tagging.
  5. Hidden Markov models.
  6. Context-free grammars for English.
  7. Parsing algorithms. The CKY algorithm and the Earley algorithm.
  8. Statistical parsing. Probabilistic context-free grammars.
  9. Features and Unification.
  10. Computational semantics with first order logic.
  11. Word sense disambiguation.
  12. Computational discourse.
  13. Information extraction.
  14. Machine translation.