程序代写代做代考 C algorithm go decision tree COMP9313:

COMP9313:
Big Data Management
Classification and PySpark MLlib

PySpark MLlib
•MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities
• Basic Statistics • Classification • Regression
• Clustering
• Recommendation System • Dimensionality Reduction • Feature Extraction
• Optimization
•It is more or less a spark version of sk-learn 2

Classification
• Classification
• predicts categorical class labels
• constructs a model based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data
•Prediction (aka. Regression)
• models continuous-valued functions, i.e., predicts
unknown or missing values
• Applications
• medical diagnosis
• credit approval
• natural language processing
3

Classification and Regression
•Given a new object 𝑜, map it to a feature
$
•Predict the output (class label) 𝑦 ∈ 𝒴
vector 𝐱 = 𝑥!,𝑥”,…,𝑥# • Binary classification
• 𝒴 = {0, 1} (sometimes {−1, 1}) • Multi-class classification
• 𝒴 = {1,2,…,𝐶}
•Learn a classification function
•𝑓 𝐱 : R, ↦ 𝒴 •Regression: 𝑓 𝐱 : R# ↦ R
4

Example of Classification – Text Categorization •Given: document or sentence
• E.g., A statement released by Scott Morrison said he has received advice … advising the upcoming sitting be cancelled.
•Predict: Topic
• Pre-defined labels: Politics or not?
•How to learn the classification function?
•𝑓 𝐱 : R, ↦ 𝒴 , • How to convert document to 𝐱 ∈ R
(e.g., feature • How to convert pre-defined labels to 𝒴 = {0, 1}?
vector)?
5

Example of Classification – Text Categorization •Input object: a sequence of words
•Input features 𝐱
• Bag of Words representation
• Freq(Morrison) = 2, freq(Trump) = 0, …

•𝐱= 2,1,0,…
•Class labels: 𝒴 • Politics: 1
• Not politics: -1
6

Convert a Problem into Classification Problem • Input
• How to generate input feature vectors • Output
• Class labels
•Another example: image classification
• Input: A matrix of RGB values
• Input features: color histogram
• E.g., pixel_count(red) = ?, pixel_count(blue) = ?
• Output: class labels • Building: 1
• Not building: -1
7

Supervised Learning •How to get 𝑓 𝐱 ?
•In supervised learning, we are given a set of training examples:
•𝒟= 𝐱.,𝑦. ,𝑖=1,…,𝑛
•Identical independent distribution (i.i.d) assumption
• A critical assumption for machine learning theory
8

Machine Learning Terminologies
•Supervised learning has input labelled data
• #instances x #attributes matrix/table
• #attributes = #features + 1
• 1 (usu. the last attribute) is for the class label
•Labelled data split into 2 or 3 disjoint subsets • Training data (used to build a classifier)
• Development data (used to select a classifier) • Testing data (used to evaluate the classifier)
•Output of the classifier
• Binary classification: #labels = 2
• Multi-label classification: #labels > 2
9

Machine Learning Terminologies •Evaluate the classifier
• False positive:
• not politics but classified as politics
• False negative
• Politics but classified as not politics
• True positive
• Politics and classified as politics
•Precision = &’ &'()’
•Recall = &’
&'()* ‘-./0102*⋅-./344
• F1 score = 2 ⋅ ‘-./0102*(-./344 10

Classification—A Two-Step Process • Classifier construction
• Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute
• The set of tuples used for classifier construction is training set
• The classifier is represented as classification rules, decision trees, or mathematical formulae
• Classifier usage: classifying future or unknown objects • Estimate accuracy of the classifier
• The known label of test sample is compared with the classified result from the classifier
• Accuracy rate is the percentage of test set samples that are correctly classified by the classifier
• Test set is independent of training set, otherwise over-fitting will occur • If the accuracy is acceptable, use the classifier to
classify data tuples whose class labels are not known
11

Classification Process 1: Preprocessing and Feature Engineering
Raw Data
Training Data
12

Classification Process 2: Train a classifier
Training Data
Classification Algorithms
Prediction
1
0
1
1
0
Classifier
𝑓 𝐱
Precision = 0.66 Recall = 0.66 F1 = 0.66
13

Classification Process 3: Evaluate the Classifier
Classifier
Prediction
1
1
1
1
0
Precision = 75% Recall = 100% F1 = 0.86
Test Data
14

How to Judge a Model?
•Based on training error or testing error?
• Testing error
• Otherwise, this is a kind of data scooping => overfitting
•What if there are multiple models to choose from?
• Further split a “development set” from the training set
•Can we trust the error values on the development set?
• Need “large” dev set => less data for training • k-fold cross-validation
15

k-fold cross-validation
16

Text Classification
•Assigning subject categories, topics, or genres
•Spam detection •Authorship identification •Age/gender identification •Language Identification •Sentiment analysis
•…
•We will do text classification in Project 2 17

Text Classification: Problem Definition • Input
• Document or sentence 𝑑 • Output
•Class label 𝑐 ∈ {c/,c0,…}
•Classification methods: • Naïve bayes
• Logistic regression
• Support-vector machines •…
18

Naïve Bayes: Intuition
•Simple (“naïve”) classification method based on Bayes rule
•Relies on very simple representation of document
• Bag of words
6 5 4 3 3 2 1 1
the I5
and
seen
yet
w o u l d
whimsical
times
sweet
satirical
adventure s1atirical
it
it 6 it II6 5
the 4 the t4o
to 3 to 3
and
and 3
3
seen
seen 2
2
yyeet t 1 1
would 1 w1 o u l d
w1himsical 1 whimsical
genre humor 1
fairy have 1
great 1 humor
tim1es 1 times
sw1 eet 1
sweet sa1tirical 1
adventure 1
genre 1
genre 1
adventure
fairy 1
fairy 1
humor 1
have 1
great 1 ……
have ……
great …
t, e
.. l g
t l
py
fairy always love it
fairy
fairy always it love
it always love t whitmsical to it
it whimsical it I and whimsical itare
seen
and and saeren Iare
seen anyaon friend friend anyone
friend dialodgiauleogue
happydhialpopgyue
happy
recomme recroemcmoemndmend
adventure adventure
adventure
satirical satiric sweet osfweet soaftirical
who sweheot of it who movie movie
I toI to movie
it butit robmuatnticromanti I to I
it butyet romyaenttic
several several
humor again agytaheient the
several it it
the the hu seen would w
again seenthe
it
the to scethnemsaInages
to scenes I
twheouml
seen
the
the times
fun
toI sfucnenes I andtimes
and I themana
whenever the
fun
abwohuilte htaivmees
conavebnotuiotns
abaonutd
I whenever ahnavde
convaentdions
conventions
to
with
whenever have
w
with
19
with
I love this movie! It’s sweet,
I love this movie! It’s swee
but with saI ltoirviecathlishmumovoier!.ItT’shseweet,
but with satirical humor. Th
it
but with satirical humor. The dialogue is great and the
o
I
yeone nd
dialogue is great and the dialogue is great and the
adventure sceandveesntaure sfucenn..e.s are fun. adventure scenes are fun…
It manages toItbmeanwahgiemsstoicbael whimsica It manages to be whimsical
and romantic while laughin and romantic while laughing
and romantic while laughing
al it 1 it
at thaetctohnevecontniovnesntoiof nthseof the at the conventions of the
fairyftailrey gtaelnereg.eInwreo.ulIdwould
cI1
andges while
hile
fairy tale genre. I would recormemcoemndmitetnodjuitstoabjuosutt abou
I humor
recommenandyiotnteo. Ij’uvestseaebnoitusteveral anyone. I’ve seen it severa
times, and I’m always happy anyone. I’ve stiemens,itansdevI’meralways hap
1 ould 1
to see it again whenever I
to see it again whenever I
mor adnages
1 1 1 1 1 1 …
times, and I’m always happy
have a friend who hasn’t
have a friend who hasn’t
to see it again whenever I seen it yet!
seen it yet! have a friend who hasn’t
seen it yet!

Naïve Bayes Classifier
•Bayes’ Rule:
• For a document d and a class c
𝑃(𝑐|𝑑) = 𝑃(𝑑|𝑐)𝑃(𝑐) 𝑃𝑑
•We want to which class is most likely 𝑐123 =argmax𝑃(𝑐|𝑑)
4∈6
20

Naïve Bayes Classifier
𝑐123 =argmax𝑃(𝑐|𝑑) 4∈6
= argmax 𝑃 𝑑 𝑐 𝑃(𝑐) 4∈6 𝑃(𝑑)
= argmax 𝑃 𝑑 𝑐 𝑃(𝑐) 4∈6
Bayes Rule
= argmax𝑃 𝑥/,𝑥0,…,𝑥7 𝑐 𝑃(𝑐) 4∈6
MAP is “maximum a posteriori” = most likely class
Dropping the denominator
Document d represented as features x1..xn
O(|X|n•|C|) parameters. Could only be estimated if a very, very large number of training examples was available.
21

Multinomial Naïve Bayes Independence Assumptions
𝑃 𝑥!,𝑥”,…,𝑥* 𝑐 𝑃(𝑐)
•Bag of Words assumption: Assume position doesn’t matter
•Conditional Independence: Assume the feature probabilities P(xi|cj) are independent given the class c.
𝑃 𝑥!,…,𝑥* 𝑐 =𝑃 𝑥! 𝑐 ⋅𝑃 𝑥” 𝑐 ⋅…⋅𝑃(𝑥*|𝑐)
22

Multinomial Naïve Bayes Classifier
𝑐123 = argmax𝑃 𝑥/,𝑥0,…,𝑥7 𝑐 𝑃(𝑐) 4∈6
𝑐89 = argmax 𝑃 𝑐: A 𝑃(𝑥|𝑐) 4∈6 ;∈< positions ¬ all word positions in test document 𝑐89 =argmax𝑃 𝑐: A 𝑃(𝑥.|𝑐:) 4∈6 .∈=>?.@.>7?
23

Learning the Multinomial Naïve Bayes Model
•First attempt: maximum likelihood estimates • simply use the frequencies in the data
Pˆ ( c j ) = d o c c o u n t ( C = c j )
N doc
𝑐𝑜𝑢𝑛𝑡(𝑤 , 𝑐 )
fraction of times word wiappears among all words in documents of topic cj
𝑃!𝑤0𝑐7 = 0 7 ∑8∈: 𝑐𝑜𝑢𝑛𝑡(𝑤, 𝑐7)
•Create mega-document for topic j by concatenating all docs in this topic
• Use frequency of w in mega-document 24

Problem with Maximum Likelihood
• What if we have seen no training documents with the word fantastic and classified in the topic positive?
𝑃A ”𝑓𝑎𝑛𝑡𝑎𝑠𝑡𝑖𝑐” 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = 𝑐𝑜𝑢𝑛𝑡(”𝑓𝑎𝑛𝑡𝑎𝑠𝑡𝑖𝑐”, 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒) = 0 ∑.∈0 𝑐𝑜𝑢𝑛𝑡(𝑤, 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
• Zero probabilities cannot be conditioned away, no matter the other evidence!
𝑐123 = argmax 𝑃B(𝑐) A 𝑃B(𝑥.|𝑐) 4.
25

Laplace (add-1) smoothing for Naïve Bayes
•Reserve a small amount of probability for unseen probabilities n (conditional)
•probabilities of observed events have to be adjusted to make the total probability equals
1.0
𝑐𝑜𝑢𝑛𝑡(𝑤0,𝑐) ∑8∈: 𝑐𝑜𝑢𝑛𝑡(𝑤, 𝑐)
𝑐𝑜𝑢𝑛𝑡𝑤0,𝑐 +1 (∑8∈:𝑐𝑜𝑢𝑛𝑡 𝑤,𝑐 )+|𝑉|
𝑃! 𝑤0 𝑐 = =
26

Multinomial Naïve Bayes: Learning
•From training corpus, extract Vocabulary
• Calculate P(cj) terms • For each cj in C do

Calculate P(wk | cj) terms
• Textj ¬ single doc containing all docsj • For each word wk in Vocabulary
docsj ¬ all docs with class =cj | docs |
nk ¬ # of occurrences of wk in Textj ¬ n k + α
P ( c ) ←¬ j
| total # documents|
P(wk |cj)←n+α|Vocabulary|
j
•Since log(xy) = log(x) + log(y), it is better to perform all computations by summing logs of probabilities rather than multiplying probabilities.
27

Example of Naïve Bayes Classifier
Document
Words
Class
Training
1
Chinese Beijing Chinese
c
2
Chinese Chinese Nanjing
c
3
Chinese Macao
c
4
Australia Sydney Chinese
o
Test
5
Chinese Chinese Chinese Australia Sydney
?
¬ |docsj | ¬ nk +α
P(cj )← |total # documents| P(wk |cj )← n+α |Vocabulary|
3 P 𝐶h𝑖𝑛𝑒𝑠𝑒|𝑐 =5+1=3 P 𝐶h𝑖𝑛𝑒𝑠𝑒|𝑜 =1+1=2 Pc= 8+67 3+69
4
P𝑗=1 4
33!11
P c|𝑑5 ∝4∗ 7 ∗14∗14≈0.0003
P𝑜|𝑑5 ∝”∗ $ !∗$∗$≈0.0001 28 #%%%
P 𝐴𝑢𝑠𝑡𝑟𝑎𝑙𝑖𝑎|𝑐 =0+1= 1 8+6 14
P 𝐴𝑢𝑠𝑡𝑟𝑎𝑙𝑖𝑎|𝑜 =1+1=2 3+6 9
P 𝑆𝑦𝑑𝑛𝑒𝑦|c =0+1= 1 8+6 14
P 𝑆𝑦𝑑𝑛𝑒𝑦|𝑜 =1+1=2 3+6 9

Summary: Naïve Bayes
•Very Fast, low storage requirements
•Robust to irrelevant features
• Irrelevant features cancel each other without
affecting results
•Very good in domains with many equally important features
•Optimal if the independence assumption hold • If assumed independence is correct, then it is the
Bayes optimal classifier for problem
•A good dependable baseline for text classification
29

PySpark MLlib – Example of NB •Create a SparkSession and read data
30
conf = SparkConf().setMaster(“local[*]”).setAppName(“lab3”) spark = SparkSession.builder.config(conf=conf).getOrCreate()
train_data = spark.read.load(“lab3train.csv”, format=”csv”, sep=”\t”, inferSchema=”true”, header=”true”)
dev_data = spark.read.load(“lab3dev.csv”, format=”csv”, sep=”\t”, inferSchema=”true”, header=”true”)
train_data.show(5) //+——–+——————–+ //|category| descript| //+——–+——————–+
//| MISC|I’ve been there t…|
//| REST|Stay away from th…|
//| REST|Wow over 100 beer…|
//| MISC|Having been a lon…|
//| MISC|This is a consist…| //+——–+——————–+
category descript
MISC I’ve been there three times and have always had wonderful experiences.
REST Stay away from the two specialty rolls on the menu, though- too much avocado and rice will fill you up right quick. REST Wow over 100 beers to choose from.
MISC Having been a long time Ess-a-Bagel fan, I was surpised to find myself return time and time again to Murray’s.

PySpark MLlib – Example of NB •build the pipeline
# white space expression tokenizer
WordTokenizer = Tokenizer(inputCol=”descript”, outputCol=”words”)
# bag of words count
countVectors = CountVectorizer(inputCol=”words”, outputCol=”features”)
# label indexer
label_stringIdx = StringIndexer(inputCol = “category”, outputCol = “label”)
# model
nb_model = NaiveBayes(featuresCol=’features’,
labelCol=’label’, predictionCol=’nb_prediction’)
# build the pipeline
nb_pipeline = Pipeline(stages=[WordTokenizer, countVectors, label_stringIdx, nb_model])
31

Pipeline
• In machine learning, it is common to run a sequence of algorithms to process and learn from data
• A Pipeline is specified as a sequence of stages
• each stage is either a Transformer or an Estimator
• stages are run in order
• the input DataFrame is transformed as it passes through each stage
• Transformer stages
• the transform() method is called on the DataFrame
• Estimator stages
• the fit() method is called to produce a Transformer (which
becomes part of the PipelineModel, or fitted Pipeline)
• then Transformer’s transform() method is called on the DataFrame
32

Pipeline
Pipeline (Estimator)
Word
Tokenizer
Transformers Estimators
Count
Vectorizer
String
Indexer
Naïve Bayes
Naïve
Bayes
Model
Raw text
words
feature vectors
numeric label
Pipeline.fit()
33

Pipeline
PipelineModel (Transformer)
Word
Tokenizer
Count
Vectorizer
String
Indexer
Naïve
Bayes
Model
Raw text
words
feature vectors
numeric label
Predictions
PipelineModel.transform()
34

More on Pipeline
•A Transformer takes a dataframe as input and produces an augmented dataframe as output
• Tokenizer
• CountVectorizer
•An Estimator must be first fit on the input dataframe to produce a model
• After fit, we got a Transformer • NaiveBayes
•Pipelines and PipelineModels help to ensure that training and test data go through identical feature processing steps
• E.g., when test data contains word that is not in training data
35

PySpark MLlib – Example of NB •Train and evaluate the classifier
# train the classifier
model = nb_pipeline.fit(train_data)
# get prediction on development dev_res = model.transform(dev_data)
# init the evaluator
evaluator = MulticlassClassificationEvaluator(predictionCol=”nb_prediction”,
metricName=’f1′)
# evaluate the result
print(‘F1 of NB classifier: ‘, evaluator.evaluate(dev_res))
// F1 of NB classifier : 0.82472850
36

Ensemble Learning
•Ensemble learning improves machine learning results by combining several models
• ensemble methods placed first in many machine learning competitions
•Three major types
• Decrease variance (bagging)
• Decrease bias (boosting)
• Improve predictions (stacking)
•two groups based on how the base learners are genrated
• Sequential: e.g., Adaboost
• Parallel: e.g., Random Forest
37

Stacking
•Stacking is an ensemble learning technique that
• combines multiple classification or regression models (base models)
• via a meta-classifier or a meta-regressor (meta-model)
•Base models
• trained based on a complete training set
• often consists of different learning algorithms
•Meta model
• trained on the outputs of the base level model as
features
• sometimes additional meta-features are used to further improve the performance
38

Stacking Example
39
Source

Stacking Framework
• Training
• Step 1: prepare training data for base models • Step 2: learn base classifiers
• Step 3: generate features for meta model
• Step 4: learn meta classifier
• Testing
• Step 5: generate features for meta classifier based
on base classifiers
• Step 6: prediction using meta classifier
40

Step 1: prepare training data for base models
•We need two types of base classifiers
• Type 1: offer the meta features when training • Type 2: offer the meta features when testing • Why different?
•Type 1: k-fold cross validation
• Split into k groups of data
• k-1 used to train the base classifier
• 1 used to obtain the prediction and generate the meta features
• What if train as a whole? • Overfitting!
•Type 2: train as a whole 41

Step 1: prepare training data for base models
Text
Category
I’ve been there t…
MISC
Stay away from th…
PAS
I ate this a week…
FOOD


Group
Text
Category
1
I’ve been there t…
MISC
4
Stay away from th…
PAS
2
I ate here a week…
FOOD



Randomly Partition into k Groups
Generate features and labels
Group
features
label
1
(5421,[1,18,31,39…
0.0
4
(5421,[0,1,15,20,…
1.0
2
(5421,[3,109,556,…
2.0



42

Step 1: prepare training data for base models
Text
Category
I’ve been there t…
MISC
Stay away from th…
PAS
I ate this a week…
FOOD


Group
Text
Category
1
I’ve been there t…
MISC
4
Stay away from th…
PAS
2
I ate here a week…
FOOD



Randomly Partition into k Groups
Generate features and labels
Group
features
label_0
label_1
label_2
1
(5421,[1,18 ,31,39…
1
0
0
4
(5421,[0,1, 15,20,…
0
1
0
2
(5421,[3,10 9,556,…
0
0
1





Group
features
label
1
(5421,[1,18,31,39…
0.0
4
(5421,[0,1,15,20,…
1.0
2
(5421,[3,109,556,…
2.0



43

Step 1: prepare training data for base models
Group
features
label_0
1
(5421,[1,18,31,39…
1.0
4
(5421,[0,1,15,20,…
0.0
2
(5421,[3,109,556,…
0.0



label = 0?
Group
features
label
1
(5421,[1,18,31,39…
0.0
4
(5421,[0,1,15,20,…
1.0
2
(5421,[3,109,556,…
2.0



Group
features
label_1
1
(5421,[1,18,31,39…
0.0
4
(5421,[0,1,15,20,…
1.0
2
(5421,[3,109,556,…
0.0



label = 1?
Group
features
label_2
1
(5421,[1,18,31,39…
0.0
4
(5421,[0,1,15,20,…
0.0
2
(5421,[3,109,556,…
1.0



label = 2?
44

Step 1: prepare training data for base models
Group
features
label_0
label_1
label_2
1
(5421,[1,18,3 1,39…
1
0
0
4
(5421,[0,1,15, 20,…
0
1
0
2
(5421,[3,109, 556,…
0
0
1





Train
Group 1
Group 2 Group 3 Group 4 Group 5
Group 1
Group 2
Group 1 Group 3 Group 4 Group 5
Group 2
Data
Group 3
Group 1 Group 2 Group 4 Group 5
Group 3
Group 4
Group 1 Group 2 Group 3 Group 5
Group 4
Group 5
Group 1 Group 2 Group 3 Group 4
Group 5
Predict on
45

Step 2: learn base classifiers
•Any model can be used as base model
• Some model can only handle binary classification problem, e.g., SVM
• Build |𝐶| one-vs-rest classifiers
• Each classifier predicts whether the sample is class 𝑐5 or not.
•In Project 2, we use naïve bayes and SVM as base models, and 𝐶 = 3
• How many classifiers do we need to train?
46

Step 2: learn base classifiers
Group 1 Group 4 Group 2 Group 5 Training Data
w/ label_0
Group 1 Group 4 Group 2 Group 5 Training Data
w/ label_1
Group 1 Group 4 Group 2 Group 5 Training Data
w/ label_2
NB SVM NB SVM NB SVM Classifier_0_3 Classifier_0_3 Classifier_1_3 Classifier_1_3 Classifier_2_3 Classifier_2_3
The above procedure will be repeated k times
47

Step 2: learn base classifiers
features
label_0
Label_1
Label_2
(5421,[1,18,3 1,39…
1
0
0
(5421,[0,1,15, 20,…
0
1
0
(5421,[3,109, 556,…
0
0
1




Training Data Training Data Training Data w/ label_0 w/ label_1 w/ label_2
NB SVM NB SVM NB SVM
Classifier_0 Classifier_0 Classifier_1
Classifier_1 Classifier_2 Classifier_2
48

Step 3: generate features for meta model
•We consider two types of meta-features
• The prediction result from each base classifier • The joint prediction result from base classifier
•Single prediction result from each base classifier
• Generate |C|*2 features
•joint prediction result from classifiers with same label system
• Generate |C| features
•all of them are one-hot-encoded
49

Step 3: generate features for meta model
Group 3
Prediction Data w/ label_0
Group 3
Prediction Data w/ label_1
Group 3
Prediction Data w/ label_2
NB SVM NB SVM NB SVM Classifier_0_3 Classifier_0_3 Classifier_1_3 Classifier_1_3 Classifier_2_3 Classifier_2_3
Single Pred.
one hot encoding
011110
[0,1] [1,0] [1,0]
[1,0] [1,0] [0,1]
50

Step 3: generate features for meta model
Group 3
Prediction Data w/ label_0
Group 3
Prediction Data w/ label_1
Group 3
Prediction Data w/ label_2
NB SVM NB SVM NB SVM Classifier_0_3 Classifier_0_3 Classifier_1_3 Classifier_1_3 Classifier_2_3 Classifier_2_3
011110
Joint Prediction
one hot encoding
01 11 10 [0,0,1,0] [1,0,0,0] [0,1,0,0]
Metafeature: [0,1,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0] 51

Step 3: generate features for meta model
Group 1
Group 2
NB SVM
Group 3
NB SVM
Group 4
NB SVM
Classifier ClassifieSrVM
Group 5
NB SVM
SVM
NB Classifier
Classifier
Classifier ClassifieSrVM
Classifier
ClassifieSrVM
SVM Classifier
_0 _0N_CB3lassifier
_0
SVM Classifier
_0 _0N_CB1lassifier
_0
NCBlassifier _0_4
_0
ClassifSieV_r1M
ClassifSieV_r1M
NB _C1lassifier
NB _C1lassifier
ClassifSieV_r1M
NB _C1lassifier
NB _C1lassifier
Classifier
Classifier
NB _C1lassifier
Classifier _2_1 _2_1
_2_2
_2_2
_2_3
_2_4
_2_5
_2_5
NCBlassifier _0_2
ClassifSieV_r1M
ClassifSieV_r1M
NCBlassifier _0_5
Classifier
_2_3
_2_4
Classifier
Group
meta_features
label
1
([0, 1, 1, 0, 1, 0, 1, …
0.0
4
([0, 1, 0, 0, 1, 1, 1, …
1.0
2
([0, 0, 1, 0, 1, 0, 1, …
2.0



52

Step 4: Learn Meta Classifier
•Use meta features as features, learn meta classifier on the whole dataset
•In project 2 we use logistic regression as meta model
Group
meta_features
label
1
([0, 1, 1, 0, 1, 0, 1, …
0.0
4
([0, 1, 0, 0, 1, 1, 1, …
1.0
2
([0, 0, 1, 0, 1, 0, 1, …
2.0



train
Meta Classifier
53

Step 5: generate meta features for prediction
•In Step 2, we have learnt base classifiers for meta feature generation in testing phase
•Before using meta classifier to predict, we need to generate meta features in a similar way as in Step 3
54

Step 5: generate meta features for prediction
features
(5421,[1,18,21,39…
(5421,[0,1,5,13,…
(5421,[3,10,56,…

Single Pred.
Joint Pred.
110001 11 00 01
NB SVM NB SVM NB SVM Classifier_0 Classifier_0 Classifier_1 Classifier_1 Classifier_2 Classifier_2
Metafeature: [1,0,1,0,0,1,0,1,0,1,1,0,1,0,0,0,0,0,0,1,0,0,1,0] 55

Step 6: prediction using meta classifier
•Use the meta classifier trained in step 4 to predict labels for the test data with meta features generated in step 5.
meta_features
([0, 1, 1, 0, 1, 0, 1, …
([0, 1, 0, 0, 1, 1, 1, …
([0, 0, 1, 0, 1, 0, 1, …

pred.
1.0
2.0
0.0

Meta Classifier
56

Project 2
•To be released in week 8
•Implement a stacking model
• 3 labels (FOOD, PAS, MISC)
• SVM, Naïve Bayes as base models • Logistic Regression as meta model
•Use Dataframe and pipeline to help you with the implementation
•No running time requirements (of course, shouldn’t be too slow)
•Deadline: the end of week 10
• Time is enough if you don’t rush it in the last 3 days…
57

More on Project 2 •Deadline: 9th Aug
•3 tasks (3*30pts) and report (10pts) • Tasks will be tested independently
• They have different difficulties… • Show your efforts in report
•Running time threshold
• Very loose
• Won’t be a problem if you use DataFrames and MLlib methods
• No need to use RDD operations 58