机器学习代写 Homework #3 Programming Assignments

CSCI 567, Fall’18

General instructions

Homework #3 Programming Assignments Due: 11:59 pm, October 21, 2018

Starting point Your repository will have now a directory “P3/”. Please do not change the name of this repository or the names of any files we have added to it. Please perform a to retrieve these files. You will find within it: boosting.py, , ,

,, sion tree check.py, , pegasos.py, pegasos.sh,

Download the data Please download mnist subset.json from Piazza/Resource/Homework. Please DO NOT push it into your repository when you submit your results. Otherwise, you will get 20% deduction of your score of this assignment.

High-level descriptions

Dataset We will use mnist subset (images of handwritten digits from 0 to 9). This is the same subset of the full MNIST that we used for Homework 1 and Homework 2. As before, the dataset is stored in a JSON-formated file mnist subset.json. You can access its training, validation, and test splits using the keys ‘train’, ‘valid’, and ‘test’, respectively. For example, suppose we load mnist subset.json to the variable x. Then, x[′train′] refers to the training set of mnist subset. This set is a list with two elements: x[′train′][0] containing the features of size N (samples) ×D (dimension of features), and x[′train′][1] containing the corresponding labels of size N.

Tasks

  1. You will be asked to implement the linear support vector machine (SVM) for binary classification (Sect. 1). Specifically, you will
    • finish implementing the following three functions— , pegasos train, and pegasos test—in pegasos.py. Refer to and Sect. 1 for more information.
    • runthescriptpegasos.shafteryoufinishyourimplementation.Thiswilloutputpegasos.json.
    • add, commit, and push (1) the pegasos.py, and (2) the pegasos.json file that you have

      created.

  2. You will be asked to implement the boosting algorithms with decision stump (Sect. 2). Specifically, you will

    • finishimplementingthefollowingclasses— , ,AdaBoostandLog itBoost. Refer to boosting.py, and Sect. 2 for more information.

    • test and debug your code using the provided boosting check.py.
    • run the script boosting.sh (which executes boosting test.py). This will output boost

    ing.json.
    • add, commit, and push (1) the boosting.py, (2) the decision stump.py, and (3) the boost

    ing.json.

  3. You will be asked to implement the decision tree classifier (Sect. 3). Specifically, you will

1

git

pull

boosting.sh

boosting check.py,

boosting test.py

clas

sifier.py

data loader.py

,

decision stump.py,

decision tree.py,

decision tree.sh

deci

decision tree test.py

requirements.txt

objective function

pegasos.py

boosting

decision stump

decision stump.py

  • finish implementing the following classes — DecisionTree, TreeNode. Refer to decision tree.py and Sect. 3 for more information.
  • test and debug your code using the provided decision tree check.py.
  • run the script (which executes decision tree test.py). This will out-

    put .

  • add, commit, and push (1) the decision tree.py, (2) the decision tree.json

    As in the previous homework, you are not responsible for loading/pre-processing data; we have done that for you (e.g., we have pre-processed the data so it has two classes: digits 0–4 and digits 5–9.). For specific instructions, please refer to text in Sect. 1 and 2 and the instructions in pegasos.py and boost ing.py.

    Cautions Please do not import packages that are not listed in the provided code. Follow the instructions in each section strictly to code up your solutions. Do not change the output format. Do not modify the code unless we instruct you to do so. A homework solution that does not match the provided setup, such as format, name, initializations, etc., will not be graded. It is your responsibility to make sure that your code runs with the provided commands and scripts on the VM. Finally, make sure that you git add, commit, and push all the required files, including your code and generated output files.

decision tree.sh

decision tree.json

2

Problem 1 Pegasos: a stochastic gradient based solver for linear SVM

In this question, you will build a linear SVM (i.e. without kernel) classifier using the Pegasos algorithm [1]. Given a training set {(xn ∈ RD, yn ∈ {1, −1})}N , the primal formulation of linear SVM can be written as

n=1 L2-regularized hinge loss as shown in the class:

λ21 T
min ||w||2 + ∑max{0,1−ynw xn}. (1)

Note that here we include the bias term b into parameter w by appending x with 1, which is slightly different from what was discussed in the class.

Instead of turning it into dual formulation, we are going to solve the primal formulation directly with a gradient-base algorithm. Recall for (batch) gradient descent, at each iteration of parameter update, we compute the gradients for all data points and take the average (or sum). When the training set is large (i.e., N is a large number), this is often too computationally expensive. Stochastic gradient descent with mini-batch alleviates this issue by computing the gradient on a subset of the data at each iteration.

Pegasos, a representative solver of eq. (1), is exactly applying stochastic gradient descent with mini- batch to this problem, with an extra step of projection described below (this is also called projected stochas- tic gradient descent). The pseudocode of Pegasos is given in Algorithm 1. At the t-th iteration of parameter update, we first take a mini-batch of data At of size K , and define A+t ⊂ At to be the subset of samples for which wt suffers a non-zero loss Next we set the learning rate ηt = 1/(λt) [step (5)]. We then perform a two-step parameter update as follows. We first compute (1 − ηtλ)wt

andforallsamples(x,y)∈A+t weaddthevectoryηtxto(1−ηtλ)wt.Wedenotetheresultingvectorby K

wt+ 1 [step (6)]. This step is in fact exactly stochastic gradient descent: wt+ 1 = wt − ηt ∇t where 22

∇t=λwt−1 ∑yx K (x,y)∈A+t

is the (sub)gradient of the objective function on the mini-batch At at wt. Last, we set wt+1 to be the projec- tion of wt+ 1 onto the set

w2Nn

[step

(3)]

[step

(4)].

2

√ B={w:||w||2 ≤1/ λ}

to control its L2 norm. This can be obtained simply by scaling wt+1 by min 1, 1/ λ ||wt+ 1 ||

[step (e)].

􏰀√􏰁

For more details of Pegasos algorithm you may refer to the original paper [1].

2

Now you are going to implement Pegasos and train a binary linear SVM classifier on the dataset mnist subset.json. You are going to implement three functions—objective function, pegasos train, and pegasos test—in a script named pegasos.py. You will find detailed programming instructions in the script.

1.1 Finishtheimplementationofthefunctionobjectivefunction,whichcorrespondstotheobjective function in the primal formulation of SVM.

1.2 Finishtheimplementationofthefunctionpegasostrain,whichcorrespondstoAlgorithm1.

3

Algorithm 1 The Pegasos algorithm
Require: a training set S = {(xn ∈ RD, yn ∈ {1, −1})}N

K, and the regularization parameter λ. Output: the last weight wT+1.

, the total number of iterations T, the batch size

  1. 1:  Initialization: w1 = 0
  2. 2:  fort=1,2,···,Tdo
  3. 3:  Randomly choose a subset of data At ∈ S (with replacement) such that |At| = K
  4. 4:  SetA+t ={(x,y)∈At :ywtTxt <1}
  5. 5:  Set ηt = 1 λt
  6. 6:  Setwt+1 =(1−ηtλ)wt+ηt∑(x,y)∈A+ yx 2Kt

􏰀√􏰁

n=1

  1. 7:  Setwt+1=min 1, 1/ λ wt+1 ||wt+1 || 2

    2

  2. 8:  endfor

1.3 Afteryoutrainyourmodel,runyourclassifieronthetestsetandreporttheaccuracy,whichisdefined

as:

# of correctly classified test samples # of test samples

Finish the implementation of the function pegasos test.

1.4 Afteryoucompleteabovesteps,runpegasos.sh,whichwillrunthePegasosalgorithmfor500iter- ations with 6 settings (mini-batch size K = 100 with different λ ∈ {0.01, 0.1, 1} and λ = 0.1 with different K ∈ {1, 10, 1000}), and output a pegasos.json that records the test accuracy and the value of objective function at each iteration during the training process.

What to do and submit: run script pegasos.sh. It will take as input and generate pegasos.json. Add, commit, and push both and before the due date.

1.5 Basedonpegasos.json,youareencouragedtomakeplotsoftheobjectivefunctionvalueversusthe number of iterations (i.e., a convergence curve) in different settings of λ and K, but you are not required to submit these plots.

References

[1] Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter Mathematical Programming, 2011, Volume 127, Number 1, Page 3. Pegasos: primal estimated sub-gradient solver for SVM.

pegasos.py

4

mnist subset.json

pegasos.json

Problem 2 Boosting

In the lectures, we discussed boosting algorithms, which construct a strong (binary) classifier based on iter- atively adding one weak (binary) classifier into it. A weak classifier is learned to (approximately) maximize the weighted training accuracy at the corresponding iteration, and the weight of each training example is updated after every iteration.

In this question, you will implement decision stumps as weak classifiers and AdaBoost. For simplicity, instead of implementing an actual base algorithm that learns decision stumps, we fix a finite set of decision stumps H in advance and pick the one with smallest weighted error at each round of boosting.

2.1 General boosting framework A boosting algorithm sequentially learns βt and ht ∈ H for t = 0, . . . , T − 1 and finally constructs a strong classifier as H(x) = sign 􏰂∑T−1 βtht(x)􏰃 .

t=0
Please read the class “Boosting” defined in boosting.py, and then complete the class by implementing

the “TODO” part(s) as indicated in boosting.py.
2.2 Decision stump A decision stump h ∈ H is a classifier characterized by a triplet (s ∈ {+1, −1}, b ∈

R, d ∈ {0, 1, · · · , D − 1}) such that

􏰀s, if xd > b,
h(s,b,d)(x) = −s, otherwise. (2)

That is, each decision stump function only looks at a single dimension xd of the input vector x, and check whether xd is larger than b. Then s decides which label to predict if xd > b.

Please first read classifier.py and decision stump.py, and then complete the class “DecisionS- tump” by implementing the “TODO” part(s) as indicated in decision stump.py.

2.3 AdaBoost AdaBoost is a powerful and popular boosting method, summarized in Algorithm 2.

Algorithm 2 The AdaBoost algorithm
Require: H: A set of classifiers, where h ∈ H takes a D-dimensional vector as input and outputs either +1

or −1, a training set {(xn ∈ RD, yn ∈ {+1, −1})}N−1, the total number of iterations T. n=0

Ensure: Learn H(x) = sign 􏰂∑T−1 βtht(x)􏰃, where ht ∈ H and βt ∈ R. t=0

  1. 1:  InitializationD0(n)= 1,∀n∈{0,1,···,N−1}. N
  2. 2:  fort=0,1,···,T−1do
  3. 3:  find ht = argminh∈H ∑n Dt(n)I[yn ̸= h(xn)]
  4. 4:  compute εt = ∑n Dt(n)I[yn ̸= ht(xn)]
  5. 5:  compute βt = 1log 1−εt 2 εt

    􏰀Dt(n) exp(−βt), if yn = ht(xn)

  6. 6:  compute Dt+1(n) = Dt(n) exp(βt), otherwise , ∀n ∈ {0, 1, · · · , N − 1}

7: normalizeDt+1(n)← Dt+1(n) , ∀n∈{0,1,···,N−1}

∑n′ Dt+1(n′)
Please read the class ”AdaBoost” defined in boosting.py, and then complete the class by implement-

8: endfor
ing the ”TODO” part(s) as indicated in boosting.py.

2.4 Final submission Please run boosting.sh, which will generate boosting.json. Add, commit, and push boosting.py, decision stump.py and boosting.json before the due date.

5

Problem 3 Decision Tree

In this question, you will implement a simple decision tree algorithm. For simplicity, only discrete features are considered here.

3.1 Conditional Entropy Recall that the quality of a split can be measured by the conditional entropy. Please read decision tree.py and complete the function “conditional entropy” (where we use base 2 for logarithm).

3.2 Tree construction The building of a decision tree involves growing and pruning. For simplicity, in this programming set you are only asked to grow a tree.

As discussed a tree can be constructed by recursively splitting the nodes if needed, as summarized in Algorithm 3 (whose interface is slightly different from the pseudocode discussed in the class). Please read decision tree.py and complete the function “split”.

Algorithm 3 The recursive procedure of decision tree algorithm

1: 2: 3: 4: 5: 6: 7: 8: 9:

3.3

functionTREENODE.SPLIT(self)
find the best attribute to split the node split the node into child nodes
for child node in child nodes do

if child node.splittable then child node.split()

◃ See Slide 27 of Lec 7 for the three “non-splittable” cases

, which will generate decision tree.json. Add, before the due date.

end if end for

endfunction

Final submission

Please run

decision tree.sh

commit, and push

,

decision tree.py

decision tree.json

6