**CSCI 374**: Machine Learning and Data Mining (Fall 2017, Oberlin College)

# CSCI 374 Syllabus

## Contact Information

**Instructor**: Adam Eck adam.eck [AT] oberlin [DOT] edu

**Office Hours**: T 9:30-11:00 AM (King 223D), W 2:30-3:30 PM (King 223D), F 4:00-5:00 (The Local Coffee &
Tea)

## Meeting Time and Location

**Time**: 1:30-2:20 MWF

**Location**: King 327

## Course Overview

Machine learning and data mining are closely related capabilities that enable computers to learn to perform tasks without explicit programming, as well as discover interesting information from data. This course explores topics within machine learning and data mining, including classification, unsupervised learning, and association rule mining. Students will gain hands-on practice with popular machine learning and data mining algorithms, as well as discuss challenges, issues and solutions to working with complexities in real-world data.

## Course Objectives

- Exposure to a breadth of topics related to machine learning and data mining.
- Understanding of supervised and unsupervised machine learning.
- Hand-on experience practicing with commonly used algorithms and software tools for machine learning and data mining.
- Practice implementating algorithms described in pseudocode.
- Consideration of the potential impact of machine learning and data mining on society and real-world applications.
- Refinement of experimentation, analysis, and technical writing skills.
- Training identifying problems of interest, developing solutions, and working in teams on a substantial student-driven project.

## Topics

Background information for machine learning and data mining (introduction, notation and terminology, probability and statistics, etc.), lazy learning, clustering, recommender systems, decision trees, Bayesian learning, association rule mining, Markov models, neural networks, bias vs. variance tradeoff, feature selection, empirical evaluation of algorithm performance, and more (time permitting).

## Course Prerequisites

- CSCI 151 (Required)
- CSCI 241 (Strongly recommended)
- MATH 220 (Recommended)

## Textbook, Clickers, and Course Website

The primary course textbook is:

Alpaydin, Ethem. *Introduction to Machine Learning, 3rd Edition*. MIT Press, Cambridge, Massachusetts. 2014.

We will be using the i<Clicker+ system as part of class participation. Questions will be asked most class periods, and students are expected to participate by responding with their best guesses as to the correct answers. Students are responsible for their own pariticipation and may not respond for other students. Clickers should be registered through Blackboard.

Information will be primarily communicated through the course website: http://cs.oberlin.edu/~aeck/Fall2017/CSCI374/index.html. Please check the website regularly for announcements, class schedule, assignments, etc.

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates and the instructor. Rather than emailing questions, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com. You can find our class page at: https://piazza.com/oberlin/fall2017/csci374/home

## Assignments

Throughout the semester, you will have the opportunity to practice the course material through hands on homework assignments. Most assignments will take one week to complete. Most assignments will involve implementing algorithms discussed in class; some assignments will involve analysis of data and/or writing, as well.

## Exams

There will be no exams in this course. Instead, there is only a short questionnaire the first week of the course that gives you an opportunity to reflect on your initial thoughts about machine learning and this course, as well as to help me get to know each one of you. The questionnaire will be graded based on participation -- if you turn it in on time with every question answered, you will automatically receive full credit. There are no right or wrong answers to many of these questions, so please do not stress out while answering!

## Quizzes

In the absence of regular exams, quizzes will be occasionally administered to evaluate individual student learning throughout the course of the semester, as well as to identify important course concepts that could benefit from additional instruction and practice. These quizzes will be either administered (1) at the beginning of lectures during the regularly scheduled course meeting time, or (2) as take-home quizzes, to be turned in at the beginning of the next course meeting time.

## Final Project

In place of a final exam, students will be required to work in groups for a Final Project assignment. Each group of students will be required to: (1) choose a project, (2) write a proposal identifying the problem of interest along with a proposed solution (presented to the class around the middle of the semester), (3) develop a solution, (4) report on the outcomes of their project and future work, and (5) present their project (during the final weeks of the semester).

The goal of this project is to provide students with an opportunity to explore their own interests within machine learning and data mining, beyond what is covered by class lectures and readings or completed in the homework assignments. For example, some students might choose to explore the application of machine learning and data mining to a particular real-world problem, finding appropriate data and investigating how different algorithms might perform on that data. Additionally, some students might choose to implement additional algorithms not considered in the homework assignments to practice with additional representations and learning approaches. Each project will be chosen by the group's members to reflect the member's own interests.

This project will require substantial participation by the members of each group, so it will be assigned sufficiently early in the semester so that students have time to successfully complete the project.

## Grading

Final grades will be determined based on your scores on the assignments, quizzes, project, and class participation as follows:

Component | % |
---|---|

Initial Questionnaire | 2% |

Attendance and Participation | 10% |

Quizzes | 15% |

Assignments | 48% |

Final Project | 25% |

## Late Homework Policy

When permitted by the Oberlin calendar (e.g., before the reading period), late submissions of programming assignments will be accepted but will be subject to a percent deduction penalty:

1 second to 1 hour late: up to 5% deduction

1 hour, 1 second – 24 hours late: up to 10% deduction

24 hours, 1 second – 48 hours late: up to 20% deduction

Each additional 24 hour period late: up to an additional 10% deduction

For example, assume an assignment is due at 11:59 PM Monday, March 6. Student X turns in the assignment at 12:15 AM Tuesday, March 7, causing a 5% deduction penalty (for a maximum possible score of 95%) due to turning in the assignment late, but less than one hour late.

Student Y later turns in the same assignment at 5:00 PM on Tuesday, March 7, causing a 10% deduction penalty (for a maximum possible score of 90%) due to being more than 1 hour but less that 24 hours late.

Finally, Student Z turns in the same assignment at 12:00 PM on Friday, March 10, causing a 40% deduction penalty (for a maximum possible score of 60%) for being more than 72 hours but less than 96 hours late.

## Disabilities

The College makes reasonable accommodations for persons with disabilities. Students should notify the Office of Disability Services located in Peters G-27/G-28 and their instructor of any disability related needs. For more information, see http://new.oberlin.edu/office/disability-services/index.dot. Any student eligible for and needing academic adjustments or accommodations because of a disability (including non-visible disabilities such as chronic diseases, learning disabilities, head injury, attention deficit/hyperactive disorder, or psychiatric disabilities) is requested to speak with the professor.

## Academic Dishonesty

Students are expected to adhere to the Oberlin College Honor Code. Any violations will be reported to the Honor Code Committee.

Different assignments in this course will have different expectations with respect to the Honor Code, which will be clearly explained in the assignment instructions (in case of confusion, please contact me). For example, the quizzes are meant to assess individual knowledge, and thus must be completed independently (without reference to study materials, textbooks, etc. unless explicitly permitted). On the other hand, the final project is a group excerise and students are required to closely collaborate with other students (within their groups) to successfully complete their projects. Between these two polar ends of the spectrum, students are encouraged to discuss the homework assignments with their peers, but (1) students must acknowledge with whom they discussed their assignment in a README file, and (2) students are not allowed to share or show their code to one another, nor discuss implementation details (discussions should be done at a higher level about the algorithms, program design, etc. and not about source code). **Please note: looking at source code from existing machine learning libraries or other sources is strictly forbidden for the homework assignments, unless explicitly permitted in the assignment instructions**. However, use of pre-existing software and libraries might be acceptable for the final group projects, provided the students receive explicit permission from the professor.

If you have any questions about what is permitted and what is not, please feel free to ask.

For every assignment, students must indicate whether they followed the Honor Code in completing the assignment. If so, students should include in a README file the following:

I have adhered to the Honor Code in this assignment.