Homework 2, due September 12th, 11:59pm
August 30, 2018
1. Use a programming language or package where random forests can be trained and
applied. Examples include Python (scikit-learn package), R and Matlab. Using the
training and test sets specified in the syllabus, perform the following tasks:
a) On the madelon dataset, for each of k ∈ {3, 10, 30, 100, 300} train a random
forest with k trees where the split attribute at each node is chosen from a random
subset of ∼
√
500 features. Use the trained trees to predict the class labels on
the training and test sets, and obtain the training and test misclassification errors.
Plot on the same graph the training and test errors vs number of trees k as two
separate curves. Report the training and test misclassification errors in a table.
(4 points)
b) Repeat point a) on the madelon dataset where the split attribute at each node is
chosen from a random subset of ∼ ln(500) features. (2 points)
c) Repeat point a) on the madelon dataset where the split attribute at each node is
chosen from all 500 features. (2 points)
1