程序代写代做代考 algorithm python data mining 408216 Data Mining and Knowledge Engineering

408216 Data Mining and Knowledge Engineering

Data Mining and Machine Learning
Ensemble Learning

Learning Outcomes
To understand the fundamental trade-off between bias and variance
To understand how generic Ensemble methods such as Bagging and Boosting can help to improve accuracy of Classification or Numeric prediction

Bias vs. Variance trade-off in Machine Learning
Bias in a learner is its ability to learn patterns in a training dataset.
A low bias implies that patterns are well captured
On data that contains a high degree of inter-dependencies Naïve Bayes will display high bias.
Similarly, a decision stump will tend to have high bias in contrast to a deeper tree

Variance

Bias Variance Trade-off
This is well illustrated by the following visual:

Ensemble Learning: How does it help?
A major motivation for ensemble learning is to optimize the trade-off
It does this by lowering the variance while ensuring that bias does not rise disproportionally
Variability is reduced by combining different models

*

Bagging
A supervised learning approach that allows several models to have an equal vote in classification
The same mining algorithm (e.g. C4.5) creates multiple models
The models vary because different subsets of training instances (from the overall training pool) are selected and used to build each model

Ensemble Methods
Construct a set of classifiers from the training data

Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

General Idea

Why does it work?
Suppose there are 25 base classifiers
Each classifier has error rate,  = 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a wrong prediction:

Examples of Ensemble Methods
How to generate an ensemble of classifiers?
Bagging

Boosting

Bagging
Sampling with replacement

Build classifier on each bootstrap sample
Each sample has probability 1-(1 – 1/n)n of being selected where n is the number of data samples

Bagging – Algorithm
pasting

Bagging in Python
Bagging is supported by Python and can be used with a base classifier of your choice

>>> from sklearn.ensemble import BaggingClassifier
>>> from sklearn.neighbors import KNeighborsClassifier
>>> bagging = BaggingClassifier(KNeighborsClassifier(), … max_samples=0.5, max_features=0.5)

Boosting

Boosting
An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records
Initially, all N records are assigned equal weights
Unlike bagging, weights may change at the end of boosting round

Boosting
Records that are wrongly classified will have their weights increased
Records that are classified correctly will have their weights decreased

Example 4 is hard to classify
Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

Example: AdaBoost
Base classifiers: C1, C2, …, CN

Error rate:

Importance of a classifier:

Example: AdaBoost
Weight update:

Illustrating AdaBoost

Data points for training

Initial weights for each data point

Illustrating AdaBoost

Boosting in Python
Boosting is also supported in Python and can be used together with the base classifier of your choice.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier
>>> X, y = load_iris(return_X_y=True)
>>> clf = AdaBoostClassifier(n_estimators=100)
By default decision stumps is chosen as the base classifier; this can be changed through the base_estimator parameter

Original Training data�
….�
D1�
D2�
Dt-1�
Dt�
D�
Step 1: �Create Multiple Data Sets�

C1�

C2�

Ct -1�

Ct�
Step 2: �Build Multiple Classifiers�
C*�
Step 3: �Combine Classifiers�

Boosting Round 1�
+�
+�
+�
-�
-�
-�
-�
-�
-�
-�
B1�
0.0094�
0.0094�
0.4623�
a = 1.9459�

Boosting Round 1�
+�
+�
+�
-�
-�
-�
-�
-�
-�
-�
Boosting Round 2�
0.3037�
-�
-�
-�
-�
-�
-�
-�
-�
+�
+�
Boosting Round 3�
+�
+�
+�
+�
+�
+�
+�
+�
+�
+�
Overall�
+�
+�
+�
-�
-�
-�
-�
-�
+�
+�
0.0009�
0.0422�
0.0276�
0.1819�
0.0038�
B1�
0.0094�
0.0094�
0.4623�
B2�
B3�
a = 1.9459�
a = 2.9323�
a = 3.8744�

Original
Training data
….
D
1
D
2
D
t-1
D
t
D
Step 1:
Create Multiple
Data Sets
C
1
C
2
C
t -1
C
t
Step 2:
Build Multiple
Classifiers
C
*
Step 3:
Combine
Classifiers

å
=

=

÷
÷
ø
ö
ç
ç
è
æ
25
13
25
06
.
0
)
1
(
25
i
i
i
i
e
e

Original Data
1
2
3
4
5
6
7
8
9
10
Bagging (Round 1)
7
8
10
8
2
5
10
10
5
9
Bagging (Round 2)
1
4
9
1
2
3
2
7
3
2
Bagging (Round 3)
1
8
5
10
5
5
9
6
3
7

Original Data
1
2
3
4
5
6
7
8
9
10
Boosting (Round 1)
7
3
2
8
7
9
4
10
6
3
Boosting (Round 2)
5
4
9
4
2
5
1
7
4
2
Boosting (Round 3)
4
4
8
10
4
5
4
6
3
4

(
)
å
=
¹
=
N
j
j
j
i
j
i
y
x
C
w
N
1
)
(
1
d
e

÷
÷
ø
ö
ç
ç
è
æ

=
i
i
i
e
e
a
1
ln
2
1

(
)
å
=
=
=
T
i
i
i
y
y
x
C
x
C
1
)
(
max
arg
)
(
*
d
a

Boosting
Round 1
+++
——-
0.00940.00940.4623
B1
 = 1.9459

Original
Data
+++
—–
++
0.1
0.10.1

Boosting
Round 1
+++
——-
Boosting
Round 2
——–
++
Boosting
Round 3
++++++++++
Overall
+++
—–
++
0.00940.00940.4623
0.3037
0.00090.0422
0.0276
0.1819
0.0038
B1
B2
B3
 = 1.9459
 = 2.9323
 = 3.8744