CSCI 374: Homework Assignment #4

C4.5 (Decision Trees)
Due: 11:59 PM on Monday, October 23

You can download the assignment instructions by clicking on this link

Instructions for using GitHub for our assignments can be found on the Resources page of the class website, as well as using this link.

Algorithm Accuracy and Runtime

To help debug your program, below are some accuracies I measured on the four data sets using different training percentages for both ID3 and C4.5 (without rule pruning) implemented in Python. These results are averaged over 30 random seeds. My implementation takes between 2-3 minutes to classify OpticalDigit, and around 1 minute for Hypothyroid.

Note: last year, the students' implementations of C4.5 did not achieve such high accuracy on OpticalDigit, instead being near 0.6-0.65. So your performance may vary. However, I think most of their solutions treated each attribute as nominal instead of continuous, which might explain the majority of the difference in the accuracies.

ID3 C4.5
Data Set 60% 75% 90% 60% 75% 90%
Monks1 0.9301 0.9358 0.9576 0.9033 0.9114 0.9455
OpticalDigit 0.5536 0.5693 0.5835 0.8760 0.8814 0.8865
Votes 0.9301 0.9346 0.9409 0.9343 0.9422 0.9439
Hypothyroid 0.9940 0.9944 0.9938