Lecture 11. Kernel Methods
COMP90051 Statistical Machine Learning
Semester 2, 2019 Lecturer: Ben Rubinstein
Copyright: University of Melbourne
COMP90051 Statistical Machine Learning
This lecture
• Kernelisation
∗ Basis expansion on dual formulation of SVMs
∗ “Kernel trick”; Fast computation of feature space dot product
• Modular learning
∗ Separating “learning module” from feature transformation ∗ Representer theorem
• Constructing kernels
∗ Overview of popular kernels and their properties
∗ Mercer’s theorem
∗ Learning on unconventional data types
2
COMP90051 Statistical Machine Learning
Kernelising the SVM
Feature transformation by basis expansion; sped up by direct evaluation of kernels – the ‘kernel trick’
3
COMP90051 Statistical Machine Learning
Handling non-linear data with the SVM
• Method 1: Soft-margin SVM (deck #10)
• Method 2: Feature space transformation (deck #4) ∗ Map data into a new feature space
∗ Run hard-margin or soft-margin SVM in new space ∗ Decision boundary is non-lφinear in original space
4
COMP90051 Statistical Machine Learning
Feature transformation (Basis expansion)
• Consider a binary classification
problem
• Each example has features [𝑥𝑥 , 𝑥𝑥 ]
• Now ‘add’ a feature 𝑥𝑥3 = 𝑥𝑥12 + 𝑥𝑥2
• Each point is now [𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥12 + 𝑥𝑥2]
• Linearly separable!
Aww ^.^
Huh?
• Not linearly separable
12
5
COMP90051 Statistical Machine Learning
• •
•
• •
Naïve workflow
Choose/design a linear model
Choose/design a high-dimensional transformation 𝜑𝜑 𝒙𝒙
∗ Hopingthatafteraddingalotofvariousfeaturessomeofthemwill make the data linearly separable
training example, and new instance compute 𝜑𝜑 𝒙𝒙
For each
Train classifier/Do predictions
Problem: impractical/impossible to compute 𝜑𝜑(𝒙𝒙) for high/infinite-dimensional 𝜑𝜑(𝒙𝒙)
6
for each
COMP90051 Statistical Machine Learning
Hard-margin SVM’s dual formulation
• Training:finding𝝀𝝀thatsolve
𝑛𝑛1𝑛𝑛𝑛𝑛 argmax�𝜆𝜆 − ��𝜆𝜆𝜆𝜆𝑦𝑦𝑦𝑦𝒙𝒙𝒙𝒙
𝑖𝑖=1𝑖𝑖 2𝑖𝑖=1𝑗𝑗=1𝑖𝑖𝑗𝑗𝑖𝑖𝑗𝑗′𝑖𝑖𝑗𝑗 s.t.𝜆𝜆𝑖𝑖 ≥0and∑𝑛𝑛 𝜆𝜆𝑖𝑖𝑦𝑦𝑖𝑖 =0
dot-product
𝝀𝝀
• Makingpredictions:classifyinstance𝒙𝒙assignof
𝑖𝑖=1
𝑠𝑠 = 𝑏𝑏 ∗ + �𝑖𝑖 = 1 𝜆𝜆 ∗ 𝑖𝑖 𝑦𝑦 𝑖𝑖 𝒙𝒙 ′ 𝑖𝑖 𝒙𝒙
𝑛𝑛
dot-product
Note: 𝑏𝑏∗ found by solving for it in 𝑦𝑦 𝑏𝑏∗ + ∑𝑛𝑛 𝜆𝜆∗𝑦𝑦 𝒙𝒙′𝒙𝒙 = 1 for any support vector j 𝑗𝑗 𝑖𝑖=1 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑗𝑗
7
COMP90051 Statistical Machine Learning
Hard-margin SVM in feature space • Training:finding𝝀𝝀thatsolve
𝑛𝑛1𝑛𝑛𝑛𝑛
argmax�𝜆𝜆𝑖𝑖 −2��𝜆𝜆𝑖𝑖𝜆𝜆𝑗𝑗𝑦𝑦𝑖𝑖𝑦𝑦𝑗𝑗𝜑𝜑 𝒙𝒙𝑖𝑖 ′𝜑𝜑 𝒙𝒙𝑗𝑗
𝝀𝝀 𝑖𝑖=1 𝑖𝑖=1 𝑗𝑗=1
s.t.𝜆𝜆𝑖𝑖 ≥0and∑𝑛𝑛 𝜆𝜆𝑖𝑖𝑦𝑦𝑖𝑖 =0
• Makingpredictions:classifynewinstance𝒙𝒙assignof 𝑛𝑛
𝑠𝑠 = 𝑏𝑏 ∗ + �𝑖𝑖 = 1 𝜆𝜆 ∗ 𝑖𝑖 𝑦𝑦 𝑖𝑖 𝜑𝜑 𝒙𝒙 𝑖𝑖 ′ 𝜑𝜑 𝒙𝒙
Note:𝑏𝑏∗ foundbysolvingforitin𝑦𝑦 𝑏𝑏∗ +∑𝑛𝑛 𝜆𝜆∗𝑦𝑦𝜑𝜑 𝒙𝒙 ′𝜑𝜑 𝒙𝒙 =1forsupportvector𝑗𝑗 𝑗𝑗 𝑖𝑖=1𝑖𝑖𝑖𝑖 𝑖𝑖 𝑗𝑗
𝑖𝑖=1
8
COMP90051 Statistical Machine Learning
Observation: Kernel representation
• Both parameter estimation and computing predictions depend on data only in a form of a dot product
∗ Inoriginalspace𝒖𝒖𝒗𝒗=∑ 𝑢𝑢𝑣𝑣 𝑖𝑖=1 𝑖𝑖 𝑖𝑖
′𝑚𝑚
∗ In transformed space 𝜑𝜑 𝒖𝒖 ′𝜑𝜑 𝒗𝒗 = ∑𝑙𝑙 𝜑𝜑 𝒖𝒖 𝜑𝜑 𝒗𝒗 𝑖𝑖=1 𝑖𝑖 𝑖𝑖
• Kernel is a function that can be expressed as a dot product in some feature space 𝐾𝐾 𝒖𝒖,𝒗𝒗 = 𝜑𝜑 𝒖𝒖 ′𝜑𝜑 𝒗𝒗
9
COMP90051 Statistical Machine Learning
Kernel as shortcut: Example
• For some 𝜑𝜑 𝒙𝒙 ’s, kernel is faster to compute directly than
first mapping to feature space then taking dot product. • For example, consider two vectors 𝒖𝒖 = 𝑢𝑢 and 𝒗𝒗 = 𝑣𝑣
11 and transformation 𝜑𝜑 𝒙𝒙 = [𝑥𝑥12, 2𝑐𝑐𝑥𝑥1, 𝑐𝑐], some 𝑐𝑐
∗ So𝜑𝜑𝒖𝒖 = 𝑢𝑢12, 2𝑐𝑐𝑢𝑢1,𝑐𝑐′and𝜑𝜑𝒗𝒗 = 𝑣𝑣12, 2𝑐𝑐𝑣𝑣1,𝑐𝑐′
2 operations +2 operations
∗ Then𝜑𝜑𝒖𝒖′𝜑𝜑𝒗𝒗 = 𝑢𝑢12𝑣𝑣12+2𝑐𝑐𝑢𝑢1𝑣𝑣1+𝑐𝑐2
• This can be alternatively computed directly as
+5 operations = 9 ops.
1 1 3 operations 𝜑𝜑𝒖𝒖′𝜑𝜑𝒗𝒗 = 𝑢𝑢𝑣𝑣 +𝑐𝑐2
11
∗ Here𝐾𝐾𝒖𝒖,𝒗𝒗 = 𝑢𝑢𝑣𝑣 +𝑐𝑐2isthecorrespondingkernel
10
COMP90051 Statistical Machine Learning
More generally: The “kernel trick”
• Consider two training points 𝒙𝒙𝑖𝑖 and 𝒙𝒙𝑗𝑗 and their dot product in the transformed space.
• 𝑘𝑘 ≡ 𝜑𝜑 𝒙𝒙 𝜑𝜑
1. Compute𝜑𝜑𝒙𝒙𝑖𝑖′
2. Compute 𝜑𝜑 𝒙𝒙𝑗𝑗
3. Compute 𝑘𝑘𝑖𝑖𝑗𝑗 = 𝜑𝜑 𝒙𝒙𝑖𝑖 ′𝜑𝜑 𝒙𝒙𝑗𝑗
𝑖𝑖𝑗𝑗 𝑖𝑖′𝑗𝑗
𝒙𝒙 kernel matrix can be computed as:
• However, for some transformations 𝜑𝜑, there’s a “shortcut” function that gives exactly the same answer 𝐾𝐾 𝒙𝒙𝑖𝑖 , 𝒙𝒙𝑗𝑗 = 𝑘𝑘𝑖𝑖𝑗𝑗
∗ Doesn’tinvolvesteps1–3andnocomputationof𝜑𝜑(𝒙𝒙𝑖𝑖)and𝜑𝜑(𝒙𝒙𝑗𝑗) ∗ Usually 𝑘𝑘𝑖𝑖𝑗𝑗 computable in 𝑂𝑂 𝑚𝑚 , but computing 𝜑𝜑 𝒙𝒙 requires 𝑂𝑂 𝑙𝑙 ,
where 𝑙𝑙 ≫ 𝑚𝑚 (impractical) and even 𝑙𝑙 = ∞ (infeasible)
11
COMP90051 Statistical Machine Learning
Kernel hard-margin SVM
• Training: finding 𝝀𝝀 that solve
feature mapping is 2 implied by kernel
𝑛𝑛1𝑛𝑛𝑛𝑛
argmax�𝜆𝜆 − ��𝜆𝜆𝜆𝜆𝑦𝑦𝑦𝑦𝐾𝐾 𝒙𝒙,𝒙𝒙
𝑖𝑖=1 𝑖𝑖
s.t.𝜆𝜆 ≥0and∑𝑛𝑛 𝜆𝜆𝑦𝑦 =0
𝑛𝑛
𝑖𝑖=1𝑗𝑗=1 𝑖𝑖 𝑗𝑗 𝑖𝑖 𝑗𝑗 𝑖𝑖 𝑗𝑗
𝝀𝝀
• Making predictions: classify new instance 𝒙𝒙 based on the
𝑦𝑦 𝑏𝑏∗+∑𝑛𝑛 𝜆𝜆∗𝑦𝑦𝐾𝐾 𝒙𝒙𝑖𝑖,𝒙𝒙𝑗𝑗 =1 𝑗𝑗 𝑖𝑖=1 𝑖𝑖 𝑖𝑖
12
𝑖𝑖
𝑖𝑖=1 𝑖𝑖 𝑖𝑖
sign of
• Here 𝑏𝑏∗ can be found by noting that for support vector 𝑗𝑗 we have
feature mapping is 𝑠𝑠 = 𝑏𝑏 ∗ + � 𝜆𝜆 ∗𝑖𝑖 𝑦𝑦 𝑖𝑖 𝐾𝐾 𝒙𝒙 𝑖𝑖 , 𝒙𝒙 𝑗𝑗 i m p l i e d b y k e r n e l
𝑖𝑖=1
COMP90051 Statistical Machine Learning
Approaches to non-linearity ANNs SVMs
• Elements of 𝒖𝒖 = 𝜑𝜑 𝒙𝒙 •
Choice of kernel K determines features 𝜑𝜑
are transformed input 𝒙𝒙
• This𝜑𝜑hasweights •
learned from data
compute 𝜑𝜑 so can support v high dim. 𝜑𝜑
𝑥𝑥 1
𝑥𝑥 𝑢𝑢1 𝑧𝑧1
Don’tlearn𝜑𝜑weights
• But,don’tevenneedto
𝑥𝑥 𝑚𝑚
𝑝𝑝
…
2 𝑢𝑢 𝑧𝑧2
…
…
𝑧𝑧 𝑞𝑞
• Alsosupportarbitrary data types
13
30/07/2013 Week 1, Lecture 2 14
30/07/2013 Week 1, Lecture 2 15
30/07/2013 Week 1, Lecture 2 16
COMP90051 Statistical Machine Learning
Modular Learning
Kernelisation beyond SVMs; Separating the “learning module” from feature space transformation
17
COMP90051 Statistical Machine Learning
Modular learning
• Allinformationaboutfeaturemappingis concentrated within the kernel
• Inordertouseadifferentfeaturemapping,simply change the kernel function
• Algorithmdesigndecouplesintochoosinga“learning method” (e.g., SVM vs logistic regression) and choosing feature space mapping, i.e., kernel
18
COMP90051 Statistical Machine Learning
Kernelised perceptron (1/3)
When classified correctly, weights are unchanged When misclassified: 𝒘𝒘 𝑘𝑘+1 = −𝜂𝜂(±𝒙𝒙)
(𝜂𝜂 > 0 is called learning rate)
If 𝑦𝑦 = 1, but 𝑠𝑠 < 0 If 𝑦𝑦 = −1, but 𝑠𝑠 ≥ 0 𝑤𝑤𝑖𝑖 ←𝑤𝑤𝑖𝑖 +𝜂𝜂𝑥𝑥𝑖𝑖 𝑤𝑤𝑖𝑖 ←𝑤𝑤𝑖𝑖 −𝜂𝜂𝑥𝑥𝑖𝑖
𝑤𝑤0 ← 𝑤𝑤0 + 𝜂𝜂 𝑤𝑤0 ← 𝑤𝑤0 − 𝜂𝜂
Suppose weights are initially set to 0 First update: 𝒘𝒘 = 𝜂𝜂𝑦𝑦𝑖𝑖 𝒙𝒙𝑖𝑖
Second update: 𝒘𝒘 = 𝜂𝜂𝑦𝑦 𝒙𝒙 + 𝜂𝜂𝑦𝑦 𝒙𝒙
1𝑖𝑖1𝑖𝑖 𝑖𝑖𝑖𝑖 11 22
Third update 𝒘𝒘 = 𝜂𝜂𝑦𝑦 𝒙𝒙 + 𝜂𝜂𝑦𝑦 𝒙𝒙 + 𝜂𝜂𝑦𝑦 𝒙𝒙
etc.
𝑖𝑖1 𝑖𝑖1 𝑖𝑖2 𝑖𝑖2 𝑖𝑖3 𝑖𝑖3
19
COMP90051 Statistical Machine Learning
Kernelised perceptron (2/3)
• Weights always take the form 𝒘𝒘 = ∑𝑛𝑛 𝛼𝛼𝑖𝑖𝑦𝑦𝑖𝑖𝒙𝒙𝑖𝑖, where 𝜶𝜶 some coefficients 𝑖𝑖=1
• Perceptronweightsalwayslinearcomb.ofdata!
• Recallthatpredictionforanewpoint𝒙𝒙isbasedon
sign of 𝑤𝑤0 + 𝒘𝒘′𝒙𝒙
• Substituting 𝒘𝒘 we get
• Thedotproduct𝒙𝒙′𝑖𝑖𝒙𝒙canbereplacedwithakernel
𝑤𝑤0 +
∑
𝑛𝑛 𝛼𝛼𝑖𝑖 𝑦𝑦𝑖𝑖 𝒙𝒙′𝑖𝑖 𝒙𝒙
𝑖𝑖=1
20
COMP90051 Statistical Machine Learning
Kernelised perceptron (3/3)
Choose initial guess 𝒘𝒘(0), 𝑘𝑘 = 0
Set 𝜶𝜶 = 𝟎𝟎
For 𝑡𝑡 from 1 to 𝑇𝑇 (epochs)
For each training example 𝒙𝒙𝑖𝑖 , 𝑦𝑦𝑖𝑖
Predict based on 𝑤𝑤0 + ∑𝑛𝑛 𝛼𝛼𝑗𝑗𝑦𝑦𝑗𝑗𝒙𝒙′𝑖𝑖𝒙𝒙𝑗𝑗
Becomes kernel matrix kij
𝑗𝑗=1
If misclassified, update 𝛼𝛼𝑖𝑖 ← 𝛼𝛼𝑖𝑖 + 1
21
COMP90051 Statistical Machine Learning
Representer theorem
• Theorem: For any training set 𝒙𝒙 , 𝑦𝑦 𝑛𝑛 , any empirical risk function 𝑖𝑖 𝑖𝑖𝑖𝑖=1
E, monotonic increasing function g, then any solution 𝑓𝑓∗∈argmin𝐸𝐸𝒙𝒙,𝑦𝑦,𝑓𝑓𝒙𝒙 ,...,𝒙𝒙,𝑦𝑦,𝑓𝑓(𝒙𝒙)+𝑔𝑔 𝑓𝑓
𝑓𝑓111𝑛𝑛𝑛𝑛𝑛𝑛
∗
𝑓𝑓(𝒙𝒙)=�𝑛𝑛 𝛼𝛼𝑘𝑘𝒙𝒙,𝒙𝒙
has representation for some coefficients
𝑖𝑖=1
𝑖𝑖𝑖𝑖
Aside: f sits in a reproducing kernel Hilbert space (RKHS)
• Tells us when a (decision-theoretic) learner is kernelizable
• The dual tells us the form this linear kernel representation takes
• SVM is but one example! ∗ Ridge regression
∗ Logistic regression
∗ Principal component analysis (PCA) ∗ Canonical correlation analysis (CCA) ∗ Linear discriminant analysis (LDA)
∗ and many more ...
22
COMP90051 Statistical Machine Learning
Constructing Kernels
An overview of popular kernels and kernel properties
23
COMP90051 Statistical Machine Learning
Polynomial kernel
• Function 𝐾𝐾 𝒖𝒖, 𝒗𝒗 = 𝒖𝒖′𝒗𝒗 + 𝑐𝑐 𝑑𝑑 is called polynomial kernel ∗ Here 𝒖𝒖 and 𝒗𝒗 are vectors with 𝑚𝑚 components
∗ 𝑑𝑑≥0isanintegerand𝑐𝑐≥0isaconstant
• Without the loss of generality, assume 𝑐𝑐 = 0 ∗ Ifit’snot,add 𝑐𝑐asadummyfeatureto𝒖𝒖and𝒗𝒗
• 𝒖𝒖′𝒗𝒗 𝑑𝑑 = 𝑢𝑢1𝑣𝑣1 +⋯+𝑢𝑢𝑚𝑚𝑣𝑣𝑚𝑚 𝑢𝑢1𝑣𝑣1 +⋯+𝑢𝑢𝑚𝑚𝑣𝑣𝑚𝑚 ... 𝑢𝑢1𝑣𝑣1 +⋯+𝑢𝑢𝑚𝑚𝑣𝑣𝑚𝑚
• = ∑𝑙𝑙 𝑢𝑢 𝑣𝑣 𝑎𝑎𝑖𝑖1 ... 𝑢𝑢 𝑣𝑣 𝑎𝑎𝑖𝑖𝑖𝑖
𝑖𝑖=1 1 1 𝑚𝑚 𝑚𝑚
• Here0≤𝑎𝑎𝑖𝑖𝑗𝑗 ≤𝑑𝑑and𝑙𝑙areintegers
• = ∑𝑙𝑙 𝑢𝑢𝑎𝑎𝑖𝑖1 ... 𝑢𝑢𝑎𝑎𝑖𝑖𝑖𝑖 𝑣𝑣𝑎𝑎𝑖𝑖1 ... 𝑣𝑣𝑎𝑎𝑖𝑖𝑖𝑖 𝑖𝑖=11 𝑚𝑚 1 𝑚𝑚
•=∑𝑙𝑙 𝜑𝜑𝒖𝒖𝜑𝜑𝒗𝒗 𝑖𝑖=1 𝑖𝑖 𝑖𝑖
• Feature map 𝜑𝜑:R𝑚𝑚 → R𝑙𝑙, where 𝜑𝜑 𝒙𝒙 = 𝑥𝑥𝑎𝑎𝑖𝑖1 ...𝑥𝑥𝑎𝑎𝑖𝑖𝑖𝑖 𝑖𝑖1𝑚𝑚
24
COMP90051 Statistical Machine Learning
Identifying new kernels
• Method1:Let𝐾𝐾1 𝒖𝒖,𝒗𝒗 ,𝐾𝐾2 𝒖𝒖,𝒗𝒗 bekernels,𝑐𝑐>0 be a constant, and 𝑓𝑓 𝒙𝒙 be a real-valued function. Then each of the following is also a kernel:
∗𝐾𝐾𝒖𝒖,𝒗𝒗 =𝐾𝐾 𝒖𝒖,𝒗𝒗 +𝐾𝐾 𝒖𝒖,𝒗𝒗 12
• Method2:UsingMercer’stheorem(comingup!)
∗𝐾𝐾𝒖𝒖,𝒗𝒗 =𝑐𝑐𝐾𝐾1 𝒖𝒖,𝒗𝒗
∗𝐾𝐾𝒖𝒖,𝒗𝒗 =𝑓𝑓𝒖𝒖𝐾𝐾1 𝒖𝒖,𝒗𝒗𝑓𝑓𝒗𝒗 ∗ See Bishop for more identities
Prove these!
25
COMP90051 Statistical Machine Learning
Radial basis function kernel
• Function 𝐾𝐾 𝒖𝒖, 𝒗𝒗 = exp −𝛾𝛾 𝒖𝒖 − 𝒗𝒗 2 is the radial basis function kernel (aka Gaussian kernel)
∗ Here 𝛾𝛾 > 0 is the spread parameter
• exp−𝛾𝛾𝒖𝒖−𝒗𝒗2 =exp−𝛾𝛾𝒖𝒖−𝒗𝒗′𝒖𝒖−𝒗𝒗
• =exp −𝛾𝛾 𝒖𝒖′𝒖𝒖−2𝒖𝒖′𝒗𝒗+𝒗𝒗′𝒗𝒗
• = exp −𝛾𝛾𝒖𝒖′𝒖𝒖 exp 2𝛾𝛾𝒖𝒖′𝒗𝒗 exp −𝛾𝛾𝒗𝒗′𝒗𝒗
• =𝑓𝑓𝒖𝒖exp2𝛾𝛾𝒖𝒖′𝒗𝒗𝑓𝑓𝒗𝒗
•=𝑓𝑓𝒖𝒖∑∞ 𝑟𝑟𝒖𝒖′𝒗𝒗𝑑𝑑𝑓𝑓𝒗𝒗 𝑑𝑑=0 𝑑𝑑
Power series expansion
• Here, each 𝒖𝒖′𝒗𝒗 𝑑𝑑 is a polynomial kernel. Using kernel identities, we conclude that the middle term is a kernel, and hence the whole expression is a kernel
26
COMP90051 Statistical Machine Learning
Mercer’s Theorem
• Question: given φ 𝒖𝒖 , is there a good kernel to use?
• Inverse question: given some function 𝐾𝐾(𝒖𝒖, 𝒗𝒗), is this a valid kernel? In other words, is there a mapping φ 𝒖𝒖 implied by the kernel?
•
Mercer’s theorem:
∗ Consider a finite sequences of objects 𝒙𝒙 , … , 𝒙𝒙
1𝑛𝑛
∗ Construct 𝑛𝑛 × 𝑛𝑛 matrix of pairwise values 𝐾𝐾(𝒙𝒙𝑖𝑖 , 𝒙𝒙𝑗𝑗 )
∗ 𝐾𝐾(𝒙𝒙𝑖𝑖,𝒙𝒙𝑗𝑗)isavalidkernelifthismatrixispositive- semidefinite, and this holds for all possible
sequences𝒙𝒙 ,…,𝒙𝒙 1𝑛𝑛
27
COMP90051 Statistical Machine Learning
Data comes in a variety of shapes
• So far in COMP90051 data has been vectors of numbers
• But what if we wanted to do machine learning on …
• Graphs
∗ Facebook, Twitter, …
• Sequences of variable lengths
∗ “science is organized knowledge”, “wisdom is organized
life”*, …
∗ “CATTC”, “AAAGAGA”
• Songs, movies, etc. * Immanuel Kant
28
COMP90051 Statistical Machine Learning
• •
• •
Handling arbitrary data structures
Kernels are powerful approach to deal with many data types
Could define similarity function on variable length strings
K(“science is organized knowledge”, “wisdom is organized life”)
Remember that we need that function 𝐾𝐾 𝒖𝒖, 𝒗𝒗 to imply a dot product in some feature space
However, not every function on two objects is a valid kernel
29
COMP90051 Statistical Machine Learning
A large variety of kernels
30
COMP90051 Statistical Machine Learning
This lecture
• Kernels
∗ Nonlinearitybybasisexpansion
∗ Kerneltricktospeedupcomputation
• Modular learning
∗ Separating“learningmodule”fromfeaturetransformation ∗ Representertheorem
• Constructing kernels
∗ Anoverviewofpopularkernelsandtheirproperties
∗ Mercer’stheorem
∗ Extendingmachinelearningbeyondconventionaldatastructure
Next lecture: Ensemble methods
31