CSCI 374 Syllabus

Contact Information

Instructor:           Adam Eck                  adam.eck [AT] oberlin [DOT] edu
Office Hours:      M 1:30-3:00 PM (King 125B), T 11:00 AM - Noon (King 125B), F 3:00-4:00 PM (King 125B)

Meeting Time and Location

Time: MWF 11:00-11:50 AM
Location: King 101

Course Overview

Machine learning and data mining are closely related capabilities that enable computers to learn to perform tasks without explicit programming, as well as discover interesting information from data. This course explores topics within machine learning and data mining, including classification, unsupervised learning, and association rule mining. Students will gain hands-on practice with popular machine learning and data mining algorithms, as well as discuss challenges, issues and solutions to working with complexities in real-world data.

Course Objectives

  1. Exposure to a breadth of topics related to machine learning and data mining.
  2. Understanding of supervised and unsupervised machine learning.
  3. Hand-on experience practicing with commonly used algorithms and software tools for machine learning and data mining.
  4. Practice implementating algorithms described in pseudocode.
  5. Consideration of the potential impact of machine learning and data mining on society and real-world applications.
  6. Refinement of experimentation, analysis, and technical writing skills.
  7. Training identifying problems of interest, developing solutions, and working in teams on a substantial student-driven project.

Topics

Background information for machine learning and data mining (introduction, notation and terminology, probability and statistics, etc.), lazy learning, clustering, recommender systems, decision trees, Bayesian learning, association rule mining, Markov models, neural networks, bias vs. variance tradeoff, feature selection, empirical evaluation of algorithm performance, and more (time permitting).

Course Prerequisites

  • CSCI 241 (Required)
  • MATH 220 (Recommended)

Textbook, Clickers, and Course Website

There is no required textbook for the class. For those who learn best from reading along with a class, readings from one or more of the recommended textbooks will be posted on the class website. Those books include:

Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. The Elements of Statistical Learning. Springer-Verlang, 2009. Website: http://web.stanford.edu/~hastie/ElemStatLearn/

Mitchell, Tom M. Machine Learning. WCB/McGraw-Hill, Boston, MA, 1997. Website: http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html

James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Robert. An Introduction to Statistical Learning (with Applications in R). Springer-Verlang, 2013. Website: https://www.statlearning.com

Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron . Deep Learning. MIT Press, Cambridge, MA, 2016. Website: http://www.deeplearningbook.org/

We will be using the i<Clicker+ system as part of class participation. Questions will be asked most class periods, and students are expected to participate by responding with their best guesses as to the correct answers. You will not be graded based on the correctness of your responses, so please just provide your best guess as to the correct answer. Students are responsible for their own pariticipation and may not respond for other students. Clickers should be registered through Blackboard.

Information will be primarily communicated through the course website: http://cs.oberlin.edu/~aeck/Fall2022/CSCI374/index.html. Please check the website regularly for the class schedule, assignments, etc. Announcements and grades will be posted to the class Blackboard page

Assignments

Throughout the semester, you will have the opportunity to practice the course material through hands on homework assignments. Most assignments will take one week to complete. Most assignments will involve implementing algorithms discussed in class; some assignments will involve analysis of data and/or writing, as well.

Class Discussions, Readings, and Reflections

Additionally during the semester, we will also have several class dicussion activites, where we will read a few articles before class about machine learning as it relates to the real-world (e.g., ML in pop culture, the ethics of ML, etc.) Everyone will be expected to read the assigned articles, participate in class discussions, as well as write up a short reflection describing your opinions and understanding of the topics discussed.

We will also have short reflection responses each week where you will be asked to connect the material discussed in class (and practiced through the assignments) or read in technical papers with your personal interests and goals.

Exams

There will be no exams in this course. Instead, there is only a short questionnaire the first week of the course that gives you an opportunity to reflect on your initial thoughts about machine learning and this course, as well as to help me get to know each one of you. The questionnaire will be graded based on participation -- if you turn it in on time with every question answered, you will automatically receive full credit. There are no right or wrong answers to many of these questions, so please do not stress out while answering!

Final Project

In place of a final exam, students will be required to work in groups for a Final Project assignment. Each group of students will be required to: (1) choose a project, (2) write a proposal identifying the problem of interest along with a proposed solution, (3) develop a solution, (4) report on the outcomes of their project and future work, and (5) present their project (during the final weeks of the semester).

The goal of this project is to provide students with an opportunity to explore their own interests within machine learning and data mining, beyond what is covered by class lectures and readings or completed in the homework assignments. For example, some students might choose to explore the application of machine learning and data mining to a particular real-world problem, finding appropriate data and investigating how different algorithms might perform on that data. Additionally, some students might choose to implement additional algorithms not considered in the homework assignments to practice with additional representations and learning approaches. Each project will be chosen by the group's members to reflect the member's own interests.

This project will require substantial participation by the members of each group, so it will be assigned sufficiently early in the semester so that students have time to successfully complete the project. It will be due at the same time that the final exam would have finished: Monday December 19 at 4:00 PM

Grading

Final grades will be determined based on your scores on the assignments, reflections, project, and class participation as follows:

Component %
Initial Questionnaire 2%
Attendance and Participation 10%
Reflections 15%
Assignments 53%
Final Project 20%

Late Homework Policy

When permitted by the Oberlin calendar (e.g., before the reading period), late submissions of programming assignments will be accepted but will be subject to a percent deduction penalty:

1 second to 1 hour late: up to 5% deduction
1 hour, 1 second to 24 hours late: up to 10% deduction
24 hours, 1 second to 48 hours late: up to 20% deduction
Each additional 24 hour period late: up to an additional 10% deduction

For example, assume an assignment is due at 11:59 PM Monday, October 3. Student X turns in the assignment at 12:15 AM Tuesday, October 8, causing up to a 5% deduction penalty (for a maximum possible score of 95%) due to turning in the assignment late, but less than one hour late.

Student Y later turns in the same assignment at 5:00 PM on Tuesday, October 4, causing up to a 10% deduction penalty (for a maximum possible score of 90%) due to being more than 1 hour but less that 24 hours late.

Finally, Student Z turns in the same assignment at 12:00 PM on Friday, October 7, causing up to a 40% deduction penalty (for a maximum possible score of 60%) for being more than 72 hours but less than 96 hours late.

Accessibility

I am committed to making this class accessible to all students. If you have accessibility needs, please email me or come discuss them with me. Things you might want to discuss accomodations for include physical and mental disabilities, both permanent and temporary, any situation that is causing you to not be able to attend class or spend as much time on this class as you would like (illness, stress, family situations, work hours, just going through a rough time), not having access to computers, anything that is keeping you from doing your best in this course. Let me know, and together we will figure something out.

Code of Conduct

Both Oberlin College and I personally value the diversity of perspective that each of you bring to this classroom and our study of Computer Science together. In this class, we must all commit to fostering a safe, inclusive and welcoming environment which will allow all of us to learn. Please respect the competance and hard work of your colleagues in this classroom. If you are made to feel uncomfortable in class or while working on class material, please notify me so we can take steps to address the situation. Students who are disruptive to class and our learning community will face consequences, including potentially being removed from the course.

Fair Warning

This is not a lecture-oriented class, or one in which mimicking prefabricated examples will lead you to success. You will be expected to work actively to construct your own understanding of the topics at hand, with the readily available help of the professor and your classmates. Many of the concepts you learn and problems you work will be new to you and ask you to stretch your thinking. It is completely natural and common to experience frustration and failure before you experience understanding. This is part of the normal learning process. Your viability as a professional in the modern workforce depends on your ability to embrace this learning process and make it work for you. You are supported on all sides by the professor and your classmates. But no student is exempt from the process and the hard work that it entails.

Academic Dishonesty

Students are expected to adhere to the Oberlin College Honor Code. Any violations will be reported to the Honor Code Committee.

Different assignments in this course will have different expectations with respect to the Honor Code, which will be clearly explained in the assignment instructions (in case of confusion, please contact me). For example, the final project is a group excerise and students are required to closely collaborate with other students (within their groups) to successfully complete their projects. Students are encouraged to discuss the homework assignments with their peers, but (1) students must acknowledge with whom they discussed their assignment in a README file, and (2) students are not allowed to share or show their code to one another, nor discuss implementation details (discussions should be done at a higher level about the algorithms, program design, etc. and not about source code). Please note: looking at source code from existing machine learning libraries or other sources is strictly forbidden for the homework assignments, unless otherwise specified in the assignment instructions. However, use of pre-existing software and libraries might be acceptable for the final group projects, provided the students receive explicit permission from the professor.

If you have any questions about what is permitted and what is not, please feel free to ask.

For every assignment, students must indicate whether they followed the Honor Code in completing the assignment. If so, students should include in a README file the following:

I have adhered to the Honor Code in this assignment.