Coursework_1-checkpoint
Fill in any place that says `YOUR CODE HERE`.
Copyright By PowCoder代写 加微信 powcoder
Suggestions
To speed up your code, think about how certain operations can be done at the same time.
Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).
Double check your code does not have $\infty$-loops, these will crash the autograder.
Blank cells in the notebook are hidden tests. Do not delete, copy, paste, or alter these cells as this will cause the tests to fail automatically.
Do not create multiple python notebooks (.ipynb files).
Do not import any new python packages (this may cause hidden tests to fail).
Each cell must run for less than 5 minutes (in fact, to get all points, the entire notebook should run in under 5 minutes).
Do not plagiarise! We take violations of this very seriously. In previous years we have identified instances of plagiarism and reported them to the Senior Teaching & Learning Administrator.
Coursework 1 – Decision Trees
Boosting took a long time to be truly understood.
… cynics say we didn’t see the forest for all the trees …
Introduction
In this assignment you will implement a decision tree algorithm and then use it for bagging and boosting. We’ve provided a tree structure for you with distinct leaves and nodes. Leaves have two fields, parent (another node) and prediction (a numerical value). Nodes have six fields:
left: node describing left subtree
right: node describing right subtree
parent: the parent of the current subtree. The head of the tree always has None as its parent. Feel free to initialize nodes with this field set to None so long as you set the correct parent later on.
cutoff_id: index of feature to cut
cutoff_val: cutoff value c (c : right)
prediction: scalar prediction at this node
class TreeNode(object):
“””Tree class.
(You don’t need to add any methods or fields here but feel
free to if you like. Our tests will only reference the fields
defined in the constructor below, so be sure to set these
correctly.)
def __init__(self, left, right, parent, cutoff_id, cutoff_val, prediction):
self.left = left
self.right = right
self.parent = parent
self.cutoff_id = cutoff_id
self.cutoff_val = cutoff_val
self.prediction = prediction
Implementing CART
Before we get started let us add a few packages that you might need. We will also load a data set ION, which we will use as our binary test classification problem.
import numpy as np
from pylab import *
import math
from numpy.matlib import repmat
import sys
import matplotlib
import matplotlib.pyplot as plt
from scipy.io import loadmat
import time
import warnings
sys.path.append(”)
warnings.filterwarnings(“ignore”, category=DeprecationWarning)
%matplotlib notebook
# load in some binary test data (labels are -1, +1)
data = loadmat(“ion.mat”)
xTrIon = data[‘xTr’].T
yTrIon = data[‘yTr’].flatten()
xTeIon = data[‘xTe’].T
yTeIon = data[‘yTe’].flatten()
xTrIon.shape, yTrIon.shape, xTeIon.shape, yTeIon.shape
def spiraldata(N=300):
r = np.linspace(1,2*np.pi,N)
xTr1 = np.array([np.sin(2.*r)*r, np.cos(2*r)*r]).T
xTr2 = np.array([np.sin(2.*r+np.pi)*r, np.cos(2*r+np.pi)*r]).T
xTr = np.concatenate([xTr1, xTr2], axis=0)
yTr = np.concatenate([np.ones(N), -1 * np.ones(N)])
xTr = xTr + np.random.randn(xTr.shape[0], xTr.shape[1])*0.2
xTe = xTr[::2,:]
yTe = yTr[::2]
xTr = xTr[1::2,:]
yTr = yTr[1::2]
return xTr,yTr,xTe,yTe
xTrSpiral,yTrSpiral,xTeSpiral,yTeSpiral=spiraldata(150)
Efficiently implementing regression trees
First, implement the function sqsplit which takes as input a (weighted) data set with labels and computes the best feature and cut-value of an optimal split based on minimum squared error. The third input is a weight vector which assigns a positive weight to each training sample. The loss you should minimize is the averaged weighted squared-loss:
{\cal L}(S)=\sum_{i \in L} {w_{i}(y_{i} – T_{L})}^2+\sum_{i \in R} {w_{i}(y_{i} – T_{R})}^2.\label{q2:loss}
You are building a regression tree, and right now you need to choose a split for the given dataset $S=\{(\vec x_1,y_1),\dots,(\vec x_n,y_n)\}$ (where we have continuous labels $y_i\in{\cal R}$).
Suppose you split on some feature $j$ with value $c$ and partition the dataset in to two sets of indices, $L$–the set of indices on the left (i.e., $i \in L \Rightarrow [x_{i}]_{j} < c$)--and $R$--the set of indices on the right (i.e., $i \in R \Rightarrow [x_{i}]_{j} > c$). Suppose you assign every data point on the left the prediction $T_{L}$ and every data point on the right the prediction $T_{R}$. Finally, suppose that each data point $x_{i}$ has an associated weight $w_{i}$, and that the weights are normalized (i.e., $\sum_{i} w_{i} = 1$).
First, we show that setting $T_{L}$ and $T_{R}$ to the weighted average label over their respective sets (i.e., $T_{L} = \frac{1}{W_{L}}\sum_{i\in L}w_{i}y_{i}$ and $T_{R} = \frac{1}{W_{R}}\sum_{i\in R}w_{i}y_{i}$) minimizes the loss $\cal L$, where $W_{L}=\sum_{i \in L}w_{i}$ and $W_{R}=\sum_{i \in R} w_{i}$ are the total weight of the left and right side respectively.
We take the derivative of the loss with respect to $T_{L}$ to obtain $$\frac{d}{dT_{L}} {\cal L}(S) = -2\sum_{i \in L}w_{i}(y_i – T_L)=-2\sum_{i\in L}w_iy_i + 2T_{L}\sum_{i}w_{i}$$ Setting this equal to zero and solving, we get $$2T_{L}w_{L}=2\sum_{i \in L}w_{i}y_{i}$$ and therefore $$T_{L} = \frac{1}{W_{L}}\sum_{i \in L}w_{i}y_{i}$$ A symmetric argument holds for $T_{R}$.
Now, imagine you are considering splitting on some feature $j$, and suppose you have already sorted the training points in the order of this feature value, so that $[x_{1}]_{j} < [x_{2}]_{j} < \cdots < [x_{n}]_{j}$. You'd like to choose a split from among $c_{1} \leq c_{2} \leq \cdots \leq c_{n-1}$, where $c_{i}=\frac{[x_{i}]_{j}+[x_{i+1}]_{j}}{2}$. One way to do this would be to, for each possible split $c_{k}$, decide whether each $x_{i}$ should be partitioned left or right, and compute $\cal L$. At the end, take the split with the lowest loss. The number of data points $n$ and there are $O(n)$ splits to consider, and the proposed algorithm would require $O(n)$ per split to evaluate $\cal L$, for a total of $O(n^2)$ time.
Now, suppose some split $c_{k}$ results in the data being partitioned in to $L^{(k)}$ and $R^{(k)}$. Suppose you are given the following quantities precomputed: $W_{L^{(k)}}$, $P_{L^{(k)}} = \sum_{i \in L} w_{i}y_{i}$, and $Q_{L^{(k)}} = \sum_{i \in L} w_{i}y_{i}^{2}$. Similarly, you are given $W_{R^{(k)}}$, $P_{R^{(k)}}$ and $Q_{R^{(k)}}$ Equipped with these precomputed quantities, we can compute $\cal L$ in constant time:
Expand the left side of the loss to $$\sum_{i \in L}w_{i}y_{i}^{2} - 2\sum_{i \in L}w_{i}y_{i}T_{L} + \sum_{i \in L}w_{i}T_{L}^{2}$$. The first term is exactly $Q_{L^{(k)}}$. The second term can be written as $-2P_{L^{(k)}}\frac{P_{L^{(k)}}}{W_{L^{(k)}}}=-2\frac{P_{L^{(k)}}^{2}}{w_{L^{(k)}}}$. The last term can be written as $w_{L^{(k)}}\frac{P_{L^{(k)}}^{2}}{w_{L^{(k)}}^{2}}=\frac{P_{L^{(k)}}^{2}}{w_{L^{(k)}}}$. The second term plus the third term is therefore simply $-\frac{P_{L^{(k)}}^{2}}{w_{L^{(k)}}}$. Therefore the whole expression can be evaluated as: $$Q_{L^{(k)}}-\frac{P_{L^{(k)}}^{2}}{w_{L^{(k)}}}$$ Similarly, the right term is: $$Q_{R^{(k)}}-\frac{P_{R^{(k)}}^{2}}{w_{R^{(k)}}}$$
Efficent Update Rule: If all feature values are distinct, only one data point moves from $R$ to $L$ when moving from split $k$ to split $k+1$. Therefore, we simply update the values accordingly. For example, we subtract $w_{k}$ from $W_{R^{(k)}}$ and add it to $W_{L^{(k)}}$. We subtract $w_{k}y_{k}$ from $P_{R^{(k)}}$ and add it to $P_{L^{(k)}}$. We subtract $w_{k}y_{k}^{2}$ from $Q_{R^{(k)}}$ and add it to $Q_{L^{(k)}}$. Crucially, all of these updates take only constant time.
def sqsplit(xTr,yTr,weights=[]):
"""Finds the best feature, cut value, and loss value.
xTr: n x d matrix of data points
yTr: n-dimensional vector of labels
weights: n-dimensional weight vector for data points
feature: index of the best cut's feature
cut: cut-value of the best cut
bestloss: loss of the best cut
N,D = xTr.shape
assert D > 0 # must have at least one dimension
assert N > 1 # must have at least two samples
if weights == []: # if no weights are passed on, assign uniform weights
weights = np.ones(N)
weights = weights/sum(weights) # Weights need to sum to one (we just normalize them)
bestloss = np.inf
feature = np.inf
cut = np.inf
# YOUR CODE HERE
raise NotImplementedError()
return feature, cut, bestloss
t0 = time.time()
fid,cut,loss = sqsplit(xTrIon,yTrIon)
t1 = time.time()
print(‘elapsed time:’,t1-t0,’seconds’)
print(“Split on feature %i on value: %2.3f” % (fid,cut))
print(“NOTE: It should split on feature 2 on value 0.304”)
Testing Your Code
As your code will be tested by an autograder, we highly recommend you test all of the code you implement to make sure it works as you expect in both normal and abnormal use-cases. Below shows an example test for the sqsplit function.
# an example test
xor1 = np.array([[1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 1, 0]]).T
yor1 = np.array( [1, 0, 0, 1, 0, 1, 1, 0])
b = np.isclose(sqsplit(xor1,yor1)[2], .25)
print(‘Function sqsplit correctly calculates bestloss on xor1/yor1 example: ‘ + str(b))
Cart tree:
Implement the function cart which returns a regression tree based on the minimum squared loss splitting rule. The function takes training data, test data, a maximum depth, and the weigh of each training example. Maximum depth and weight are optional arguments. If they are not provided you should make maximum depth infinity and equally weight each example. You should use the function sqsplit to make your splits.
Use the provided TreeNode class to represent your tree. Note that the nature of CART trees implies that every node has exactly 0 or 2 children.
def cart(xTr,yTr,depth=np.inf,weights=None):
“””Builds a CART tree.
The maximum tree depth is defined by “maxdepth” (maxdepth=2 means one split).
Each example can be weighted with “weights”.
xTr: n x d matrix of data
yTr: n-dimensional vector
maxdepth: maximum tree depth
weights: n-dimensional weight vector for data points
tree: root of decision tree
n,d = xTr.shape
if weights is None:
w = np.ones(n) / float(n)
w = weights
# YOUR CODE HERE
raise NotImplementedError()
Implement the function evaltree, which evaluates a decision tree on a given test data set.
# YOUR CODE HERE
raise NotImplementedError()
def evaltree(root,xTe):
“””Evaluates xTe using decision tree root.
root: TreeNode decision tree
xTe: n x d matrix of data points
pred: n-dimensional vector of predictions
assert root is not None
# YOUR CODE HERE
raise NotImplementedError()
In the below test you should see an output very similar to the following:
elapsed time: 0.65 seconds
Training RMSE : 0.00
Testing RMSE : 0.74
(elapsed time will be slightly different)
t0 = time.time()
root = cart(xTrIon, yTrIon)
t1 = time.time()
tr_err = np.mean((evaltree(root,xTrIon) – yTrIon)**2)
te_err = np.mean((evaltree(root,xTeIon) – yTeIon)**2)
print(“elapsed time: %.2f seconds” % (t1-t0))
print(“Training RMSE : %.2f” % tr_err)
print(“Testing RMSE : %.2f” % te_err)
The following code defines a function visclassifier(), which plots the decision boundary of a classifier in 2 dimensions. Execute the following code to see what the decision boundary of your tree looks like on the ion data set.
def visclassifier(fun,xTr,yTr,w=[],b=0):
visualize decision boundary
Define the symbols and colors we’ll use in the plots later
yTr = np.array(yTr).flatten()
w = np.array(w).flatten()
symbols = [“ko”,”kx”]
marker_symbols = [‘o’, ‘x’]
mycolors = [[0.5, 0.5, 1], [1, 0.5, 0.5]]
# get the unique values from labels array
classvals = np.unique(yTr)
plt.figure()
# return 300 evenly spaced numbers over this interval
xrange = np.linspace(min(xTr[:, 0]), max(xTr[:, 0]),res)
yrange = np.linspace(min(xTr[:, 1]), max(xTr[:, 1]),res)
# repeat this matrix 300 times for both axes
pixelX = repmat(xrange, res, 1)
pixelY = repmat(yrange, res, 1).T
xTe = np.array([pixelX.flatten(), pixelY.flatten()]).T
# test all of these points on the grid
testpreds = fun(xTe)
# reshape it back together to make our grid
Z = testpreds.reshape(res, res)
# Z[0,0] = 1 # optional: scale the colors correctly
# fill in the contours for these predictions
plt.contourf(pixelX, pixelY, np.sign(Z), colors=mycolors)
# creates x’s and o’s for training set
for idx, c in enumerate(classvals):
plt.scatter(xTr[yTr == c,0],
xTr[yTr == c,1],
marker=marker_symbols[idx],
if w != []:
alpha = -1 * b / (w ** 2).sum()
plt.quiver(w[0] * alpha, w[1] * alpha,
w[0], w[1], linewidth=2, color=[0,1,0])
plt.axis(‘tight’)
# shows figure and blocks
plt.show()
tree=cart(xTrSpiral,yTrSpiral) # compute tree on training data
visclassifier(lambda X:evaltree(tree,X),xTrSpiral,yTrSpiral)
print(“Training error: %.4f” % np.mean(np.sign(evaltree(tree,xTrSpiral)) != yTrSpiral))
print(“Testing error: %.4f” % np.mean(np.sign(evaltree(tree,xTeSpiral)) != yTeSpiral))
print(“NOTE: You should get something similar to:”)
print(“Training error: 0.0000”)
print(“Testing error: 0.0467 (may be slightly different)”)
def onclick_cart(event):
Visualize cart, including new point
global xTraining,labels
# create position vector for new point
pos=np.array([[event.xdata,event.ydata]])
if event.key == ‘shift’: # add positive point
color=’or’
else: # add negative point
color=’ob’
xTraining = np.concatenate((xTraining,pos), axis = 0)
labels.append(label);
marker_symbols = [‘o’, ‘x’]
classvals = np.unique(labels)
mycolors = [[0.5, 0.5, 1], [1, 0.5, 0.5]]
# return 300 evenly spaced numbers over this interval
xrange = np.linspace(0, 1,res)
yrange = np.linspace(0, 1,res)
# repeat this matrix 300 times for both axes
pixelX = repmat(xrange, res, 1)
pixelY = repmat(yrange, res, 1).T
xTe = np.array([pixelX.flatten(), pixelY.flatten()]).T
# get decision tree
tree=cart(xTraining,np.array(labels).flatten())
fun = lambda X:evaltree(tree,X)
# test all of these points on the grid
testpreds = fun(xTe)
# reshape it back together to make our grid
Z = testpreds.reshape(res, res)
# Z[0,0] = 1 # optional: scale the colors correctly
plt.xlim((0,1))
plt.ylim((0,1))
# fill in the contours for these predictions
plt.contourf(pixelX, pixelY, np.sign(Z), colors=mycolors)
for idx, c in enumerate(classvals):
plt.scatter(xTraining[labels == c,0],
xTraining[labels == c,1],
marker=marker_symbols[idx],
plt.show()
xTraining= np.array([[5,6]])
labels = [1]
%matplotlib notebook
fig = plt.figure()
plt.xlim(0,1)
plt.ylim(0,1)
cid = fig.canvas.mpl_connect(‘button_press_event’, onclick_cart)
plt.title(‘Use shift-click to add negative points.’)
Random Forests
CART trees are known to be high variance classifiers
(if trained to full depth).
An effective way to prevent overfitting is to use Bagging.
Implement the function forest,
which builds a forest of regression trees.
Each tree should be built using training data
drawn by randomly sampling $n$ examples
from the training data with replacement.
Do not randomly sample features.
The function should output a list of trees.
def forest(xTr, yTr, m, maxdepth=np.inf):
“””Creates a random forest.
xTr: n x d matrix of data points
yTr: n-dimensional vector of labels
m: number of trees in the forest
maxdepth: maximum depth of tree
trees: list of TreeNode decision trees of length m
n, d = xTr.shape
trees = []
# YOUR CODE HERE
raise NotImplementedError()
return trees
Now implement the function evalforest, which should take as input a set of $m$ trees, a set of $n$ test inputs, and an $m$ dimensional weight vector. Each tree should be weighted by the corresponding weight. (For random forests you can define the weights to be $\frac{1}{m}$ for all trees.
def evalforest(trees, X, alphas=None):
“””Evaluates X using trees.
trees: list of TreeNode decision trees of length m
X: n x d matrix of data points
alphas: m-dimensional weight vector
pred: n-dimensional vector of predictions
m = len(trees)
n,d = X.shape
if alphas is None:
alphas = np.ones(m) / len(trees)
pred = np.zeros(n)
# YOUR CODE HERE
raise NotImplementedError()
return pred
The following script visualizes the decision boundary of a random forest ensemble.
trees=forest(xTrSpiral,yTrSpiral,30) # compute tree on training data
visclassifier(lambda X:evalforest(trees,X),xTrSpiral,yTrSpiral)
print(“Training error: %.4f” % np.mean(np.sign(evaltree(tree,xTrSpiral)) != yTrSpiral))
print(“Testing error: %.4f” % np.mean(np.sign(evaltree(tree,xTeSpiral)) != yTeSpiral))
The following script evaluates the test and training error of a random forest ensemble as we vary the number of trees.
M=20 # max number of trees
err_trB=[]
err_teB=[]
alltrees=forest(xTrIon,yTrIon,M)
for i in range(M):
trees=alltrees[:i+1]
trErr = np.mean(np.sign(evalforest(trees,xTrIon)) != yTrIon)
teErr = np.mean(np.sign(evalforest(trees,xTeIon)) != yTeIon)
err_trB.append(trErr)
err_teB.append(teErr)
print(“[%d]training err = %.4f\ttesting err = %.4f” % (i,trErr, teErr))
plt.figure()
line_tr, = plt.plot(range(M),err_trB,label=”Training Error”)
line_te, = plt.plot(range(M),err_teB,label=”Testing error”)
plt.title(“Random Forest”)
plt.legend(handles=[line_tr, line_te])
plt.xlabel(“# of trees”)
plt.ylabel(“error”)
plt.show()
# Your training error should converge
def onclick_forest(event):
Visualize forest, including new point
global xTrain,yTrain,w,b,M
# create position vector for new point
pos=np.array([[event.xdata,event.ydata]])
if event.key == ‘shift’: # add positive point
color=’or’
else: # add negative point
color=’ob’
xTrain = np.concatenate((xTrain,pos), axis = 0)
yTrain = np.append(yTrain, label)
marker_symbols = [‘o’, ‘x’]
classvals = np.unique(yTrain)
w = np.array(w).flatten()