程序代写代做 decision tree data mining Data Mining Assignment

Data Mining Assignment

Take any reasonable data set you like with at least 200 observations.
Next, shuffle the data set and create three sets with it: a training set, a validation set, and a test set. You are free to use any appropriate split, but I will recommend 50% for training, 30% for validation, and hold out the remaining 20% of observations to the very end to see how you did. You may choose to do k-fold cross validation; in this case you would use 80% of the data for that purpose, and hold the remaining 20% to the end.
Next, build a decision tree or a regression tree with your training data subject to two rules. These rules could be something like: maximum tree depth, minimum number of observations in a leaf, minimum decrease in impurity, etc.. Use your validation data to find the optimal settings of these rules e.g. minimum tree depth = 3 if it gives the best accuracy. Finally, use your 20% hold-out set to see how these optimal settings do in production.
Guidelines are the same as prior weeks.