Live Coding Wk5 – Lecture 12 – Decision Trees¶
For this demo we will be exploring how to do the decision tree classifier. We’ll also show you a couple of extensions which can help decision trees perform better, that way we aren’t stuck just using sklearn functions.
Copyright By PowCoder代写 加微信 powcoder
### Imports and data you will need
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Some thing which should be a bit familiar
data = pd.read_csv(‘./data/galaxies-clean.csv’)
del data[‘id’]
data.head()
redshift mag_u mag_g mag_r mag_i mag_z deVRad_r deVAB_r expRad_r expAB_r stellar_mass class
0 0.048096 17.595560 16.019156 15.319424 14.944425 14.666128 6.055902 0.613006 3.575483 0.519457 10.642694 1
1 0.062509 17.436966 15.313985 14.333739 13.920478 13.618734 10.595940 0.631542 5.563209 0.620282 11.518373 0
2 0.089931 19.444744 17.803787 16.951828 16.509365 16.191961 3.363955 0.571397 1.940210 0.626922 10.683678 1
3 0.051296 19.139469 17.244236 16.319923 15.842777 15.486821 3.303546 0.265656 2.369305 0.301659 10.578798 1
4 0.107117 19.413588 18.234081 17.606298 17.289528 17.067013 10.181945 0.331512 4.247050 0.364165 10.238853 1
data[‘class’].value_counts()
Name: class, dtype: int64
data.groupby(‘class’).mean()
redshift mag_u mag_g mag_r mag_i mag_z deVRad_r deVAB_r expRad_r expAB_r stellar_mass
0 0.132367 19.419470 17.445670 16.389934 15.953217 15.624847 3.339908 0.782404 1.954753 0.786333 11.173304
1 0.092609 19.064516 17.548128 16.791406 16.397738 16.118604 7.485915 0.482168 3.497646 0.502706 10.557123
Have a bit of set up for this demo.
# Split the training and test datasets
train_data, test_data = train_test_split(data, train_size = 0.8, random_state=123)
x_train = train_data.to_numpy()[:, :-1]
y_train = train_data.to_numpy()[:, -1]
x_test = test_data.to_numpy()[:, :-1]
y_test = test_data.to_numpy()[:, -1]
Trees and Stumps¶
Your floral career has taken a turn for the stars. In this demo script, we’ll be exploring how you can create decision tree classifiers for determining types of classifying different types of galaxies. The ANU Research School of Astronomy & Astrophysics (RSSA) has tasked you with exploring the galaxy zoo dataset to disguish between spiral and elliptical galaxies.
Galaxy Zoo¶
Galaxy zoo is an interesting dataset, where the results are crowd funded by the public. You can visit the website to classify by hand different galaxy images, contributing to science!
Fortunately the RSSA has provided a clean version of the data, so we don’t need to understand too much about astronomy…
Never-the-less, here are the colume descriptors of the data:
Column Description
redshift redshift
mag_u magnitude u band
mag_g magnitude g band
mag_r magnitude r band
mag_i magnitude i band
mag_z magnitude z band
deVRad_r de Vaucouleurs scale radius fit in r band
deVAB_r ellipticity from de Vaucouleurs fit in r band
expRad_r exponential scale radius fit in r band
expAB_r ellipticity from exponential fit in r band
stellar_mass log galaxy mass (in units of solar mass)
class label = {0=elliptical, 1=spiral}
Planting Trees¶
Lets go through some of the basics and checkout how to define a decision tree clasifier through sklearn. There are a couple of important parameters we should pay attention to:
criterion: determines how we measure the quality of a split in the tree.
max_depth: determines the maximum depth of the tree.
DecisionTreeClassifier?
Lets go through the standard motions and define our decision tree with max_depth = 3 for our dataset.
# For you todo
decision_tree = None # TODO!
# Train as well
decision_tree.fit() # specify the paramaters
Lets see how we went.
# How do we predict using this model (trained decision tree?)
pred = decision_tree.predict() # specify the paramaters
# Can we Predict class probabilities of the input samples?
probs = decision_tree.predict_proba() # specify the parameters
# Compute the scores for how we did
dt_score = None # TODO
print(“Decision Tree Score: {:.3}”.format(dt_score))
Another nice thing about decision tree is we can see how each of decisions are made. We can use a function defined in the preamble to display a tree.
# Output the decision tree
fig = plt.figure(figsize=(16, 10))
tree.plot_tree(decision_tree, fontsize=8, feature_names=data.columns, class_names=[‘elliptical’, ‘spiral’])
plt.show()
Discussion: Although we might understand what the features mean, why might this tree be useful? How does it compare to the other algorithms we have gone through?
Scientists like that you can reason about the decision. Explainable AI!
One Stump¶
We are going to try and define what is called a decision stump, a special type of decision tree. A decision stump is a decision tree where there is only one decision! In other words, its just a yes or no question. Make and train a decision stump using DecisionTreeClassifier:
#For you todo
decision_stump = None # TODO
# Train too!
#For you todo
decision_stump = DecisionTreeClassifier(max_depth=1) # TODO
# Train too!
decision_stump.fit(x_train, y_train)
DecisionTreeClassifier(max_depth=1)
# Output the decision stump
fig = plt.figure(figsize=(16, 10))
tree.plot_tree(decision_stump, fontsize=8, feature_names=data.columns, class_names=[‘elliptical’, ‘spiral’])
plt.show()
Discussion: How do you think the performance will compare to our original decision tree? (This shouldn’t be too surprising)
Discussion Here!
Lets actually evaluate the performance now.
ds_score = None # TODO
print(“Decision Tree Score: {:.3}”.format(dt_score))
print(“Decision : {:.3}”.format(ds_score))
So was what I made you do pointless?
Boosting Performance¶
Using the decision stumps we introduced above, we will now go through an ensemble algorithm called boosting (specifically we will go through the AdaBoost algorithm).
Boosting aims to construct and train serveral decision stumps iteratively, and use all trained decision stumps to do a majority vote in prediction.
Here is a sketch of what we will do:
Define a function which takes a list of decision trees and weights to make majority predictions.
Explore how to weight different examples when training.
Create a set of decision stumps using boosting.
Lets now define a function which takes a list of decision trees, and predicts the class through a weighted majority vote.
For each of the x_inputs, we want to predict the highest voted class where the weight of the vote is determined by tree_weights:
tree_lists a list of decision trees.
tree_weights the weight of a vote for each decision tree.
x_inputs what we are predict on.
# Fill in the rest of the function!
def majority_vote(tree_lists, tree_weights, x_inputs):
# Voting counts for the predictions
predict = {0: np.zeros(len(x_inputs)), 1: np.zeros(len(x_inputs))}
# Iterate through trees and weights
for tree, w in zip(tree_lists, tree_weights):
# Save the vote of the decision tree (multiplied by the weight)
pred = tree.predict(x_inputs)
predict[1] += pred * w
predict[0] += (1 – pred) * w
# Get the highest vote as the prediction
return np.argmax([predict[0], predict[1]], axis=0)
Alright lets go for the first item on our list. Check out the sample_weight parameter for the train and predict functions for decision tress.
decision_stump.fit?
Lets go through a simple weighting scheme first for as an example. Construct a vector for which elliptical galaxies have twice the weight as spiral galaxies, and train a new decision stump using the weighting. (Hint: You need to weight each row of x_train!)
# For you todo
weights = 2 – y_train # Easiest way of doing this (recall the class labels = {0=elliptical, 1=spiral})
w_decision_stump = None # TODO
w_decision_stump.fit(x_train, y_train, sample_weight=weights)
Another model, another round of evaluation.
# For you todo
wds_score = None # TODO
print(“Decision Tree Score: {:.3}”.format(dt_score))
print(“Decision : {:.3}”.format(ds_score))
print(“Weighted Decision : {:.3}”.format(wds_score))
Hmmm, the score should have performed slightly worse. This isn’t very surprising.
Discussion: Although not useful in this domain, highlight cases where this might be useful (say in the medical sciences)?
Diagnosis of illnesses, where a positive result might be more important to identify. Or maybe class imbalances in the datasets.
Lets get to the punchline of this demo, boosting using decision trees. Consider the following below.
boosting_iters = 5 # The number of stumps we are making
boosting_weights = np.ones(y_train.shape) / len(y_train) # Error weights are initialized to be equal
stump_weights = []
boosted_stumps = []
for i in range(boosting_iters):
cur_error = 0
# Train a weighted stump
stump = DecisionTreeClassifier(max_depth=1)
stump.fit(x_train, y_train, sample_weight=boosting_weights) # note the sample weights
cur_pred = stump.predict(x_train)
# Recalculate the weights
for j in range(len(cur_pred)):
if cur_pred[j] != y_train[j]:
cur_error += boosting_weights[j] # When we misclassify
boosting_weights[j] *= cur_error / (1 – cur_error) # When we correctly classify
# Renormalise the weights to sum = 1
boosting_weights /= sum(boosting_weights)
# Add a stump weight
stump_weights.append(np.log( (1 – cur_error) / cur_error) )
# Add the stump
boosted_stumps.append(stump)
Discussion: Lets go through the code to answer understand a bit how boosting works:
What happens to the error when a stump misclassifies?
What happens to the weight of a training example (boosting_weights[j]) when a stump correctly classify?
How does the performance of a stump effect it’s voting power (stump_weights)? What happens when the error is larger than 50%?
Here is a plot of $\log( (1 – \varepsilon) / \varepsilon)$ to help with point 3.
num_points = 10_000
x_vals = np.arange(0 + 1/num_points, 1, 1/num_points)
plt.plot(x_vals, np.log( (1 – x_vals) / x_vals))
plt.axvline(0.5, ls=’–‘, c=’r’, alpha=0.5)
plt.axvline(0, ls=’–‘, c=’r’, alpha=0.5)
plt.axvline(1, ls=’–‘, c=’r’, alpha=0.5)
plt.axhline(0, ls=’-‘, c=’k’, alpha=0.5)
plt.ylabel(‘ ‘)
plt.xlabel(‘Error Value’)
plt.show()
Discussion Here!
– What happens to the error when a stump misclassifies?
Current error increases by the current weight of the misclassified example
– What happens to the weight of a training example (boosting_weights[j]) when a stump correctly classify?
The weight decreases. The weight decreases less as we have more errors.
– How does the performance of a stump effect it’s voting power (`stump_weights`)? What happens when the error is larger than 50%?
Lower error = Higher stump weight.
And if the error is larger than 50%, we even get a negative weight (negative vote).
Now use the majority_vote function to make a prediction using the boosted stumps, and then calculate the accuracy score manually.
# Your code
boosted_pred = majority_vote(boosted_stumps, stump_weights, x_test)
boosted_score = sum(boosted_pred == y_test) / len(y_test)
And here are all the results for all models we have considered:
print(“Decision Tree Score: {:.3}”.format(dt_score))
print(“Decision : {:.3}”.format(ds_score))
print(“Weighted Decision : {:.3}”.format(wds_score))
print(“Boosted : {:.3}”.format(boosted_score))
Discussion: Boosting should have performed slightly better than our original decision tree (in-fact this will occur semi-consistently with different randomisations of training and test data). Now count the number of decisions (forks) our original decision tree made and the number of decisions our boosted stumps will make in total. Surprised? What happened to Occam’s Razor?
Discussion Here!
Decision Tree Decisions = 7
Boosted Stumps = 5
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com