Semi-Supervised Learning
Slides credit: Jerry Zhu, Aarti Singh
Supervised Learning
Feature Space Label Space Goal:
Optimal predictor (Bayes Rule) depends on unknown PXY, so instead learn a good prediction rule from training data
Learning algorithm
Labeled
2
Labeled and Unlabeled data
Human expert/ Special equipment/ Experiment
“Crystal” “Needle” “Empty”
“0” “1” “2” …
“Sports” “News” “Science”
…
Cheap and abundant !
Expensive and scarce !
3
Free-of-cost labels?
Luis von Ahn: Games with a purpose (ReCaptcha)
Word challenging to OCR (Optical Character Recognition)
You provide a free label!
4
Semi-Supervised learning
Learning algorithm
Supervised learning (SL)
“Crystal”
Semi-Supervised learning (SSL)
Goal: Learn a better prediction rule than based on labeled data alone.
5
Semi-Supervised learning in Humans
6
Can unlabeled data help?
Positive labeled data
Negative labeled data
Unlabeled data
Supervised Decision Boundary
Semi-Supervised Decision Boundary
Assume each class is a coherent group (e.g. Gaussian)
Then unlabeled data can help identify the boundary more accurately.
7
Can unlabeled data help?
“0” “1” “2” …
This embedding can be done by manifold learning algorithms, e.g. tSNE
7
9
4
1 1
8
3 3
22
5
“Similar” data points have “similar” labels
8
Algorithms
Semi-Supervised Learning
Slides credit: Jerry Zhu, Aarti Singh
Some SSL Algorithms
Self-Training
Generative methods, mixture models Graph-based methods
Co-Training
Semi-supervised SVM Many others
10
Notation
11
Self-training
12
Self-training Example
Propagating 1-NN
13
Mixture Models for Labeled Data
16
Mixture Models for Labeled Data
Estimate the parameters from the labeled data
><1/2
Decision for any test point not in the labeled dataset
17
Mixture Models for Labeled Data
18
Mixture Models for SSL Data
19
Mixture Models
20
Mixture Models SL vs SSL
21
Mixture Models
22
Gaussian Mixture Models
23
EM for Gaussian Mixture Models
24
Assumption for GMMs
25
Assumption for GMMs
26
Assumption for GMMs
27
Related: Cluster and Label
28
29
Graph Based Methods
Assumption: Similar unlabeled data have similar labels.
30
Graph Regularization
Similarity Graphs: Model local neighborhood relations between data points
Assumption:
Nodes connected by heavy edges tend to have similar label
31
Graph Regularization
If data points i and j are similar (i.e. weight wij is large), then their labels are similar fi = fj
Loss on labeled data Graph based smoothness prior (mean square,0-1) on labeled and unlabeled data
32
Co-training
Co-training Algorithm
Co-training (Blum & Mitchell, 1998) (Mitchell, 1999) assumes that
(i) (ii)
• •
•
features can be split into two sets;
each sub-feature set is sufficient to train a good classifier.
Initially two separate classifiers are trained with the labeled data, on the two sub-feature sets respectively.
Each classifier then classifies the unlabeled data, and ‘teaches’ the other classifier with the few unlabeled examples (and the predicted labels) they feel most confident.
Each classifier is retrained with the additional training examples given by the other classifier, and the process repeats.
33
Co-training Algorithm
Blum & Mitchell’98
Semi-Supervised SVMs
36
Semi-Supervised Learning
Generative methods
Graph-based methods Co-Training
Semi-Supervised SVMs Many other methods
SSL algorithms can use unlabeled data to help improve prediction accuracy if data satisfies appropriate assumptions
37