1. Use a programming language or package where random forests can be trained and applied. Examples include Python (scikit-learn package), R and Matlab. Using the training and test sets specified in the syllabus, perform the following tasks:
- a) On the madelon dataset, for each of k ∈ {3, 10, 30, 100, 300} train a random forest with k trees where the split attribute at each node is chosen from a random
subset of ∼
the training and test sets, and obtain the training and test misclassification errors. Plot on the same graph the training and test errors vs number of trees k as two separate curves. Report the training and test misclassification errors in a table. (4 points) - b) Repeat point a) on the madelon dataset where the split attribute at each node is chosen from a random subset of ∼ ln(500) features. (2 points)
- c) Repeat point a) on the madelon dataset where the split attribute at each node is chosen from all 500 features. (2 points)