程序代写代做代考 data mining decision tree algorithm Data Reshaping

Data Reshaping

Data Reshaping

Faculty of Information Technology, Monash University, Australia

FIT5196 week 09

(Monash) FIT5196 1 / 43

Outline

1 Data Transformation
Data Normalisation/Scaling
Transformation by generating new features
Nominal to Numeric Transformation

2 Data Discretisation

3 Feature Engineering & Data Sampling

4 Summary

(Monash) FIT5196 2 / 43

Data Wrangling Process

(Monash) FIT5196 3 / 43

Data Transformation

Outline

1 Data Transformation
Data Normalisation/Scaling
Transformation by generating new features
Nominal to Numeric Transformation

2 Data Discretisation

3 Feature Engineering & Data Sampling

4 Summary

(Monash) FIT5196 4 / 43

Data Transformation

Data Transformation



●●


●●



● ●

●●

● ●


● ●


● ●




●●


●●●


●● ●






● ●


● ●● ●






●●

● ●

●●



● ●


●●

● ●



●●


● ●

● ●●
●●


● ●

●●●

●● ●




●●




●●



● ●●


●●

●●●●

●●



● ●

●● ●


● ●

● ●


● ●





●●

●●

●●



● ●


●●



●●

●●


● ●

● ●

●●

●●

●●


●●●

● ●


● ●

●● ●

● ●


●●

● ●


●●



● ●

●●

●●●



●●

●●●


● ●●


●●



● ●

● ● ●

●●●

●●

●●

●●●

●● ●

●●

● ●



●●

● ●





●●●




●●



● ●

● ●

●●


● ●



●●
●●


● ●




●●


● ●●

● ●

● ●



●●


●●


● ●

● ●●


●●●



● ●


●●●

● ●


● ●●


● ●●

●●
●● ●●

● ●

● ●● ●





● ●


● ●

● ●




●●●


● ●

●●






●●



●●






●●



●● ●

●●● ●● ●

● ●

● ●




● ●

●●

● ●




●●

●●
● ●


●●


● ●

●●





●●



●●



● ●

●●


●●


● ●

● ●


● ●

● ●


● ●●



● ●



●●

● ●
●●


●● ●

● ●


●●

● ●


● ●
● ●

●●


● ●


●●





● ● ●


● ●●


●●

●●● ●



●●


● ●

● ●●●

●● ●



● ●

●●




● ●●


●●








●●

●●●
● ●

● ●



●●





●●

● ● ●

● ●



●●






● ●

●●


● ●



● ●



● ●


● ●

●●







●●

●●

●● ●



●●●

● ●


● ●●

●●●



● ●


● ●

● ●


● ● ●● ●



●●


●●

● ●●●

●●



●●

●●

●●

●●●




●●


●●

● ●


●●


● ●



●● ●





● ●●


●●
●●

● ●

● ●● ● ●





● ●

●●
● ●

●●

●●




●●


● ●

● ●




●●


● ●

●●

● ●

●●


● ●

●●


● ●

● ●




●●

●●


● ●●
●●




● ●




● ●

●●






● ●




● ●

●●





●●


● ●●
●●


● ●



● ●●

● ●●


● ●



● ●


●●

●●

●●


●●


●●

●●



● ●

●●



●●



●●




●●

●●
● ●



●●

● ●

●●




●●


●●

● ●●

●●
● ●


●●


●●

●● ●


●●

● ●

●●

●●●




● ●



●●

●●


●●


●●● ●

● ●




● ●



●●
●●● ●

●●



● ●

●●





●●








●●●


● ●

●●
●●●



●●

● ●

●●

●●


●●●

● ●●●

●●


● ●

●●●●●

●●

●●●



●●

● ●
●● ●● ●●



● ●


● ●

●●


●●●


●●●


●● ●



●●
● ●


● ●●


●●



● ●


● ●




●● ●


●●




●●
● ●








●●

●●






● ●


● ●

●●



● ●






●●

● ●● ●

●●●

●●

● ●


●●

●●

●●●


●●

●●

●●

●●





●●





●●


●● ●




●●

● ●●●

●●





●●

● ●

●●


● ●●

●●




● ●


● ●



●●

● ●


● ●●

● ●

●●


● ●


● ●



● ●

●● ●


● ●

● ●


●● ●


●●


●●

●●


● ●





●●


●●

●●


● ●


● ●




●● ●

●●

● ●● ●●



●●●

●● ●● ●●

●●


● ●●





●●

●●

●●

● ●


●●● ●


●● ●


●●

●●

● ●




●●



● ●


●●


●●


● ●

● ●


● ●



●●

● ●
●● ●



● ●

●●





● ●



● ● ●

●●

●●


● ●

●●



●●

● ●
●●

●●

● ●
●● ●●

●●

●●
●●




●●


●●



●●●●




● ●●


●●




●●●

●●
●●


● ●


● ●●


●● ●

●● ●

●●


●●







●●● ●● ●


●●●




●●

●● ●

●●

● ●●


●●

●●

● ●

●●



● ●

● ●●


● ●

●●




●●

● ●








●●


● ●


●●●


●● ●

● ●●




●●

●●







●● ●

●●
●●



● ●
● ●



●●


●●



● ●





●●


●●







● ●


●●

●●

● ●

●●

● ●

●●


●●

● ●


●●



●●

● ●

● ●



● ●●●

● ●

●●


● ●


●●


●●


●●

● ● ●●

●● ●

● ●●

●●




●●

●●


●●


● ●●




●●

● ●


●●

●●●



● ●
●● ●



●● ●


●●


● ●

●●●●

● ●

● ●



● ● ●●

● ●


●●




●●


●●●

● ●


●●



● ●●









● ●
●●

●●




●●
● ●




● ●




●●





●●

●●


●● ●



●●

●● ●

●●



●●

●●

●●





● ●

●●

●●

● ●

●●


●●


● ●


●●

● ●

● ●



● ●

● ●

●●
●●


●●





●●●

●●


● ●

●●

● ●




● ●



●●●●

●●



●● ●●



●●

●●


●●

●●








● ●●


● ●●●


●● ●



●●

●●


● ●




● ●

●●

●●

● ●

● ●


●●

●● ●


●●





●●


●●

● ●


● ●


●●




●●




●● ●●


●● ●

● ●








●●


● ●


●●● ●

● ●

●●

● ●

●●


● ●


● ●



●●


●●

●● ●


● ●●


●● ●

●●●●


●●●


●●

0 2000 4000 6000 8000 10000 12000 14000

0
e

+
0

0
1

e
+

0
6

2
e

+
0

6
3

e
+

0
6

4
e

+
0

6
5

e
+

0
6

6
e

+
0

6
7

e
+

0
6

x

y


●●

●●



● ●

●●

● ●


● ●


● ●




●●


●●●


●● ●






● ●


● ●● ●






●●

● ●

●●



● ●


●●

● ●



●●


● ●

● ●●
●●


● ●

●●●

●● ●




● ●




●●



● ●●


● ●

●●●●

●●



● ●

●● ●


● ●

● ●


● ●





●●

●●

●●



● ●


●●



●●

●●


● ●

● ●

●●

● ●

●●


●●●

● ●


● ●

●● ●

● ●


●●

● ●


● ●



● ●

●●

●●●



●●

●●●


● ●●


● ●



● ●

● ● ●

●●●

●●

●●

●●●

●● ●

●●

● ●



●●

● ●





●●●




●●



● ●

● ●

●●


● ●



●●
●●


● ●




●●


● ●●

● ●

● ●



●●


●●


● ●

● ●●


● ●●



● ●


●●●

● ●


● ●●


● ●●

●●
●● ●●

● ●

● ●● ●





● ●


● ●

● ●




●●●


● ●

●●






● ●



● ●






●●



●● ●

●● ● ●● ●

● ●

● ●



● ●

● ●

● ●




● ●

●●
● ●


●●


● ●

●●





●●



●●



● ●

●●


●●


● ●

● ●


● ●

● ●


● ●●



● ●



●●

● ●
● ●


●● ●

● ●


●●

● ●


● ●
● ●

●●


● ●


●●





● ● ●


● ●●


●●

●●● ●



●●


● ●

● ●●●

●● ●



● ●

●●




● ●●


●●








●●

●●●
● ●

● ●



●●





●●

● ● ●

● ●



● ●






● ●

●●


● ●



● ●



● ●


● ●

●●







●●

●●

●● ●



●●●

● ●


● ●●

●● ●



● ●


● ●

● ●


● ● ●● ●



●●


●●

● ●●●

●●



●●

●●

●●

●●●




●●


●●

● ●


●●


● ●



●● ●





● ●●


●●
●●

● ●

● ●● ● ●





● ●

●●
● ●

●●

●●




●●


● ●

● ●




●●


● ●

●●

● ●

●●


● ●

●●


● ●

● ●




●●

● ●


● ●●
●●




● ●




● ●

●●






● ●




● ●

● ●





● ●


● ●●
●●


● ●



● ●●

● ●●


● ●



● ●


●●

●●

●●


● ●


●●

●●



● ●

●●



●●



●●



● ●

●●
● ●



●●

● ●

●●



● ●


●●

● ●●

●●
● ●


●●


●●

●● ●


●●

● ●

●●

●●●




● ●



●●

●●


●●


●●● ●

● ●




● ●



●●
●●● ●

●●



● ●

●●





●●








●●●


● ●

● ●
●●●



●●

● ●

●●

●●


● ●●

● ●●●

●●


● ●

●●●●●

●●

●●●



●●

● ●
●● ●● ●●



● ●


● ●

●●


●●●


●●●



●● ●



●●
● ●


● ●●


●●



● ●


● ●



● ● ●


●●




●●
● ●








●●

●●






● ●


● ●

●●



● ●






●●

● ● ● ●

●●●

●●

● ●


●●

●●

●●●


●●

●●

●●

●●





●●





●●


●● ●




●●

● ●●●

●●





●●

● ●

●●


● ●●

●●




● ●


● ●



●●

● ●


● ●●

● ●

●●


● ●


● ●



● ●

●● ●


● ●


● ●


●● ●


●●


●●

●●


● ●





●●


●●

●●


● ●


● ●




●● ●

● ●

● ●● ●●



●●●

●● ●● ●●

●●


● ●●





●●

●●

●●

● ●


●●● ●


●● ●


●●

●●

● ●




●●



● ●


●●


●●


● ●

● ●


● ●



●●

● ●
●● ●



● ●

●●





● ●



● ● ●

●●

● ●


● ●

●●



●●

● ●
● ●

●●

● ●
●● ●●

● ●

●●
●●




● ●


●●



●●●●




● ●●


●●




●●●

●●
●●


● ●


● ●●


●● ●

●● ●

●●


●●







●●● ●● ●


● ●●




●●

●● ●

●●

● ●●


●●

● ●

● ●

●●



● ●

● ●●


● ●

●●




●●

● ●








●●


● ●


●●●


●● ●

● ●●




●●

●●







●● ●

●●
●●



● ●
● ●



●●


●●


● ●





●●


●●






● ●


● ●

●●

● ●

●●

● ●

●●


●●

● ●


●●


●●

● ●

● ●



● ●●●

● ●

●●


● ●


●●


●●


●●

● ● ●●

●● ●

● ● ●

●●





●●

● ●


●●


● ●●




●●

● ●


●●

●●●



● ●
●● ●



●● ●


●●


● ●

●● ●●

● ●


● ●



● ● ●●

● ●


●●




●●


●●●

● ●


●●



● ●●









● ●
●●

●●




●●
● ●




● ●




●●





●●

● ●


●● ●



●●

●● ●

●●



●●

●●

● ●





● ●

●●

●●

● ●

●●


●●


● ●


●●

● ●

● ●



● ●

● ●

●●
●●


●●





●●●

●●


● ●

●●

● ●




● ●



●●●●

● ●



●● ●●



●●

●●


●●

● ●








● ●●


● ●● ●


●● ●



●●

●●


● ●




● ●

●●

●●

● ●

● ●


●●

●● ●


●●





●●


●●

● ●


● ●


●●




●●




●● ●●


●● ●

● ●








●●


● ●


●●● ●

● ●

●●

● ●

●●


● ●


● ●



●●


●●

●● ●


● ●●


●● ●

●●● ●


●●●


●●

6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5

0
e

+
0

0
1

e
+

0
6

2
e

+
0

6
3

e
+

0
6

4
e

+
0

6
5

e
+

0
6

6
e

+
0

6
7

e
+

0
6

log_x

y




● ●

● ●

● ●






●●


● ●


●●

●●



●●



●●

●●


●●


●●


● ●






●●

● ●








● ●

● ●

●●


●●

● ●


●●


●●


●●


● ●












● ●


●●

● ●



●●

● ●

● ●


●●

●●



●●

● ●

●●●




●●



● ●


● ●



●●






● ●


●●

● ●

● ●

● ●

● ●

●●

● ●





● ●


●●




●●



● ●



●●




● ●

● ●



● ●


● ●


● ●●


●●

● ●




● ●


●● ●




● ●








●●

● ●








● ●


● ●


● ●


● ● ●
● ●



● ●




●●





●● ●


● ●●



● ●


●●





●●






● ●


●●


● ●





●●


● ●

● ●


● ●






●●

● ●



● ●



●●

● ●●


● ●


●● ●


● ●


●●

● ●




● ●

●●



●●

● ●

●●


● ●



●●


●●



●●




●●


●●

●●


●●


● ●

● ●


●●●



● ●

●●


● ●




●●









●● ●

●●

●●

●●



●●

●●

●●

●●




●●

● ●

●●



●●

●●






● ●

● ●

●●






●●

●●

●●




●●

●●


● ●

●●


●● ●




●●



●●



●●









● ●



●● ●


●●


● ●

●●

●●





●●



●●








●●

●●



● ●


●●

● ●


●●

●●

●● ●



●●

●●


●●


● ●



●●




● ●




●● ●

● ●●


●● ●





●●


●●





● ●

●●

●●

● ●



●●

● ●


● ●

●●

●●


●●

● ●



●●



●●






●●

●●



● ●




●●














●●

●●

●●


●●



●●



●●



● ●





●●


● ●




●●

●●


● ●



●●

● ●




● ●

●●



● ●


●●



●●

● ●



●●

●●
●●



● ●



● ●

● ●


●●

●●



● ●

●●●

0 2000 4000 6000 8000 10000 12000 14000

1
2

1
3

1
4

1
5

x

lo
g

_
y




● ●

● ●

● ●






●●


● ●


●●

●●



●●



●●

●●


● ●


●●


● ●






●●

● ●








● ●

● ●

●●


● ●

● ●


●●


●●


●●


● ●












● ●


●●

● ●



●●

● ●

● ●


●●

●●



●●

● ●

●●●




●●



● ●


● ●



●●






● ●


● ●

● ●

● ●

● ●

● ●

● ●

● ●





● ●


●●




●●



● ●



●●




● ●

● ●



● ●


● ●


● ●●


●●

● ●




● ●


●● ●




● ●








●●

● ●








● ●


● ●


● ●


● ● ●
● ●



● ●




●●





●● ●


● ●●



● ●


●●





●●






● ●


●●


● ●





●●


● ●

● ●


● ●






●●

● ●



● ●



● ●

● ●●


● ●


●● ●


● ●


●●

● ●




● ●

●●



●●

● ●

●●


● ●



●●


●●



●●




●●


●●

●●


●●


● ●

● ●


●●●



● ●

●●


● ●




●●









● ● ●

●●

●●

●●



●●

●●

●●

●●




●●

● ●

●●



●●

●●






● ●

● ●

●●






●●

●●

●●




●●

●●


● ●

● ●


●● ●




●●



●●



●●









● ●



●● ●


●●


● ●

●●

● ●





●●



●●








●●

●●



● ●


●●

● ●


●●

●●

●● ●



●●

●●


●●


● ●



●●




● ●



●● ●

● ●●


●● ●





●●


●●





● ●

●●

●●

● ●



●●

● ●


● ●

●●

●●


●●

● ●



●●



●●





●●

●●



● ●




●●














●●

●●

●●


●●



●●



●●



● ●





●●


● ●




●●

● ●


● ●



● ●

● ●




● ●

●●



● ●


●●



●●

● ●



●●

●●
●●



● ●



● ●

● ●


●●

●●



● ●

●●●

6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5

1
2

1
3

1
4

1
5

log_x

lo
g

_
y

(Monash) FIT5196 5 / 43

Data Transformation

Data Transformation

Why: Raw attributes are usually not good enough to obtain accurate
predictive model.
É k-nearest neighbours (KNN) with an Euclidean distance measure if want all
features to contribute equally

d(p, q) =
Æ

(p1 − q1)2 + (p2 − q2)2 + · · ·+ (pn − qn)2 =

i

(pi − qi )2

É logistic regression, SVMs, perceptrons, neural networks etc. if you are using
gradient descent/ascent-based optimisation, otherwise some weights will
update much faster than others

∆wj = −η
∂J
∂wj

= η

i

(t(i) − o(i))x (i)j

so that wj := wj + ∆wj
É linear discriminant analysis, principal component analysis, kernel principal
component analysis since you want to find directions of maximising the
variance (under the constraints that those directions/eigenvectors/principal
components are orthogonal); you want to have features on the same scale
since you’d emphasise variables on “larger measurement scales” more.
(Monash) FIT5196 6 / 43

Data Transformation

Data Transformation

Data transformation
É A series of manipulation steps to transform the original attributes or to
generate new attributes with better properties that will help the predictive
power of the model.

É To achieve properties that enhance the modelling and analysis (linearity,
statistical or visual interpretability).

É Methods
− Normalisation/Scaling methods
− Transformation by generating new features (i.e., variables or attributes)

(Monash) FIT5196 6 / 43

Data Transformation Data Normalisation/Scaling

Outline

1 Data Transformation
Data Normalisation/Scaling
Transformation by generating new features
Nominal to Numeric Transformation

2 Data Discretisation

3 Feature Engineering & Data Sampling

4 Summary

(Monash) FIT5196 7 / 43

Data Transformation Data Normalisation/Scaling

Data Transformation — Normalisation

There are two types of data normalisation:

Standardisation (z-score normalisation): where the focus is on shifting the
distribution of data to have mean of 0 and standard deviation of 1.
Scaling: where the focus is on rescaling data value range to a specific
interval.
É Min-Max normalisation
É Decimal scaling

(Monash) FIT5196 8 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Standardisation

Z-score Normalisation

Rescale the features (or variables) so that they will have the properties of a
standard normal distribution with

µ = 0 & σ = 1.0

How?
x ′ =

x − µ
σ

where

µ =
1
n

i

xi

σ =

1
n

i

(xi − µ)2

(Monash) FIT5196 9 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Standardisation

Z-score Normalisation

2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0
Alcohol

1

0

1

2

3

4

5

6

M
al

ic
A

ci
d

Alcohol and Malic Acid content of the wine dataset
input scale
Standardized u=0, s=1

(Monash) FIT5196 9 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Standardisation

Z-score Normalisation

11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0
Alcohol

1

2

3

4

5

6

M
al

ic
A

ci
d

Input scale
Class 1
Class 2
Class 3

2 1 0 1 2
Alcohol

1

0

1

2

3

M
al

ic
A

ci
d

Standardized [u=0 s=1]
Class 1
Class 2
Class 3

(Monash) FIT5196 9 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Min-Max Scaling

Min-Max Scaling

Rescale the features (or variables) that their values are in a specific range
[X ′min,X


max ].

How?

Xscaled =
X − Xmin

Xmax − Xmin

X ′max − X

min

+ X ′min

If the fixed range is [0,1]

Xscaled =
X − Xmin

Xmax − Xmin

(Monash) FIT5196 10 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Min-Max Scaling

Min-Max Scaling

0 2 4 6 8 10 12 14
Alcohol

0

1

2

3

4

5

6
M

al
ic

A
ci

d
Alcohol and Malic Acid content of the wine dataset

input scale
min-max scaled [min=0, max=1]

We will end up with smaller standard deviations, which can suppress the effect of outliers

(Monash) FIT5196 10 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Min-Max Scaling

Min-Max Scaling

11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0
Alcohol

1

2

3

4

5

6

M
al

ic
A

ci
d

Input scale
Class 1
Class 2
Class 3

0.0 0.2 0.4 0.6 0.8 1.0
Alcohol

0.0

0.2

0.4

0.6

0.8

1.0

M
al

ic
A

ci
d

min-max scaled [min=0, max=1]
Class 1
Class 2
Class 3

(Monash) FIT5196 10 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Standardisation vs Min-Max

“Standardisation or Min-Max scaling?”: depends on the application

2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0
Alcohol

1

0

1

2

3

4

5

6

M
al

ic
A

ci
d

Alcohol and Malic Acid content of the wine dataset
input scale
Standardized u=0, s=1
min-max scaled [min=0, max=1]

PCA:
standardisation

Image processing:
pixel intensities
have to be
normalised to fit
within a certain
range (i.e., 0 to
255 for the RGB
colour range)

ANN: data that on
a 0-1 scale

(Monash) FIT5196 11 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Standardisation vs Min-Max

400 200 0 200 400 600 800 1000
1st principal component

20

10

0

10

20

30

40

50

60

2n
d

pr
in

ci
pa

l c
om

po
ne

nt

Transformed NON-standardized training dataset after PCA
class 1
class 2
class 3

4 2 0 2 4
1st principal component

4

3

2

1

0

1

2

3

4

2n
d

pr
in

ci
pa

l c
om

po
ne

nt

Transformed standardized training dataset after PCA
class 1
class 2
class 3

(Monash) FIT5196 12 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Standardisation vs Min-Max

400 200 0 200 400 600 800 1000
1st principal component

20

10

0

10

20

30

40

50

60

2n
d

pr
in

ci
pa

l c
om

po
ne

nt

Transformed NON-standardized training dataset after PCA
class 1
class 2
class 3

1.0 0.5 0.0 0.5 1.0
1st principal component

0.6

0.4

0.2

0.0

0.2

0.4

0.6

0.8

2n
d

pr
in

ci
pa

l c
om

po
ne

nt

Transformed min_max scaled training dataset after PCA
class 1
class 2
class 3

(Monash) FIT5196 12 / 43

Data Transformation Data Normalisation/Scaling

Data Normalisation — Decimal Scaling

Shift the decimal place of a numeric value such that the maximum absolute
value will be always less than 1

How:
x ′ =

x
10c

where c is the smallest integer such that max(|x ′|) < 1. Example: É −500 ≤ x ≤ 45 ⇒ −0.500 ≤ x ≤ 0.045 É How to convert? − xmax = max(abs(x)) = 500 − c = ceil(log10(xmax )) = 3.0 − x/ = 10.03.0 = x/1000.0 (Monash) FIT5196 13 / 43 Data Transformation Data Normalisation/Scaling Data Normalisation — Decimal Scaling Shift the decimal place of a numeric value such that the maximum absolute value will be always less than 1 How: x ′ = x 10c where c is the smallest integer such that max(|x ′|) < 1. Example: É −500 ≤ x ≤ 45 ⇒ −0.500 ≤ x ≤ 0.045 É How to convert? − xmax = max(abs(x)) = 500 − c = ceil(log10(xmax )) = 3.0 − x/ = 10.03.0 = x/1000.0 (Monash) FIT5196 13 / 43 Data Transformation Transformation by generating new features Outline 1 Data Transformation Data Normalisation/Scaling Transformation by generating new features Nominal to Numeric Transformation 2 Data Discretisation 3 Feature Engineering & Data Sampling 4 Summary (Monash) FIT5196 14 / 43 Data Transformation Transformation by generating new features Data Transformation Data Transformation is a process of re-expressing data in a form that is more suitable for analysis. Reasons for data transformation É Fix skewness in data É Enhance data visualisation É Better interpretability É Improve the compatibility of data with assumptions underlying a modelling process Methods: different mathematical formulas from statistical analysis É linear transformation É log transformation É Power transformation É Box-Cox Transformation É others: Quadratic transformation, (non-)polynomial approximation of transformation, rank transformation (Monash) FIT5196 15 / 43 Data Transformation Transformation by generating new features Data Transformation Linear Transformation Linear transformation preserves the linear relationship between the features. Aggregate the information contained in various features Linear transformation function: Given a subset of the complete set of attributes, X1,X2, . . . ,Xm, Xagg = w0 + m ∑ i=1 wiXi Examples: É Celsius to Fahrenheit É Miles to Kilometers É Inches to Centimeters (Monash) FIT5196 16 / 43 Data Transformation Transformation by generating new features Data Transformation Log transformation makes highly skewed distributions less skewed ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●●●● ● ●●● ● ● ● ● ●● ● 0 2000 4000 6000 8000 10000 12000 14000 0 e + 0 0 1 e + 0 6 2 e + 0 6 3 e + 0 6 4 e + 0 6 5 e + 0 6 6 e + 0 6 7 e + 0 6 x y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 1 2 1 3 1 4 1 5 log_x lo g _ y (Monash) FIT5196 17 / 43 Data Transformation Transformation by generating new features Data Transformation Log transformation makes highly skewed distributions less skewed ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●●●● ●●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●●● ● ●● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ●●●●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ●●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● −2 0 2 − 5 0 5 1 0 1 5 Theoretical Quantiles S ta n d a rd iz e d r e si d u a ls lm(y ~ .) Normal Q−Q 2287 2762 1638 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −2 0 2 − 3 − 2 − 1 0 1 2 3 4 Theoretical Quantiles S ta n d a rd iz e d r e si d u a ls lm(log_y ~ .) Normal Q−Q 2287 4052762 (Monash) FIT5196 18 / 43 Data Transformation Transformation by generating new features Data Transformation Power Transformation Tukey and Mosteller’s Bulging Rule: The idea is that it might be interesting to transform X and Y at the same time, using some power functions. Y qi = β0 + β1X p i + ηi (Monash) FIT5196 19 / 43 Data Transformation Transformation by generating new features Data Transformation Power Transformation ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 2 .3 2 .4 2 .5 2 .6 (p=1/2,q=2) x y ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●●●● ● ● ● ●● ● ● ●●● ●●● ●● ● ●● 0.0 0.2 0.4 0.6 0.8 1.0 0 .6 8 0 .7 0 0 .7 2 (p=3,q=−5) x y ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0 .1 4 0 .1 6 0 .1 8 (p=1/2,q=−1) x y ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ●● ● ● ●●● ●● ● ●● ● ●● 0.0 0.2 0.4 0.6 0.8 1.0 1 .3 8 1 .4 2 1 .4 6 (p=3,q=5) x y Y qi = β0 + β1X p i + ηi More information can be found https://www.r-bloggers.com/tukey-and-mostellers-bulging-rule-and-ladder-of-powers/ (Monash) FIT5196 20 / 43 Data Transformation Transformation by generating new features Data Transformation Power Transformation (Monash) FIT5196 21 / 43 Data Transformation Transformation by generating new features Data Transformation Power Transformation (Monash) FIT5196 21 / 43 Data Transformation Transformation by generating new features Data Transformation Power Transformation Seek optimal transformations: learnt p and q with L-BFGS (Monash) FIT5196 21 / 43 Data Transformation Transformation by generating new features Data Transformation The Box-Cox Transformation: transforms a continuous variable into an almost normal distribution. y = ¨ xλ−1 λ if λ 6= 0 log(x) if λ = 0 Figure: Examples of the Box-Cox transformation x ′ λ versus x for λ = −1, 0, 1. In the second row, x ′ λ is plotted against log(x). The red point is at (1, 0). (Monash) FIT5196 22 / 43 Data Transformation Transformation by generating new features Data Transformation The Box-Cox Transformation: transforms a continuous variable into an almost normal distribution. y = ¨ xλ−1 λ if λ 6= 0 log(x) if λ = 0 (Monash) FIT5196 22 / 43 Data Transformation Transformation by generating new features Data Transformation The Box-Cox Transformation: transforms a continuous variable into an almost normal distribution. y = ¨ xλ−1 λ if λ 6= 0 log(x) if λ = 0 (Monash) FIT5196 22 / 43 Data Transformation Transformation by generating new features Data Transformation The Box-Cox Transformation: transforms a continuous variable into an almost normal distribution. y = ¨ xλ−1 λ if λ 6= 0 log(x) if λ = 0 (Monash) FIT5196 22 / 43 Data Transformation Transformation by generating new features Data Transformation The Box-Cox Transformation: transforms a continuous variable into an almost normal distribution. y = ¨ xλ−1 λ if λ 6= 0 log(x) if λ = 0 (Monash) FIT5196 22 / 43 Data Transformation Transformation by generating new features Data Transformation The Box-Cox Transformation: transforms a continuous variable into an almost normal distribution. With negative values in the attributes y = ( (x+c)λ−1 gλ if λ 6= 0 log(x+c) g if λ = 0 where É A parameter c: offset the negative values É g: scale the resulting values, often considered as the geometric mean of the data. É λ: greedily search λ so that the resulting attribute is as close as possible to the normal distribution. (Monash) FIT5196 23 / 43 Data Transformation Nominal to Numeric Transformation Outline 1 Data Transformation Data Normalisation/Scaling Transformation by generating new features Nominal to Numeric Transformation 2 Data Discretisation 3 Feature Engineering & Data Sampling 4 Summary (Monash) FIT5196 24 / 43 Data Transformation Nominal to Numeric Transformation Nominal to Numeric Transformation Why? É Many machine learning algorithms only accept numeric value, while in many applications we have nominal attributes. How? É Integer substitution: map each nominal value in the domain to numeric value É Example: assume we have a color attribute with Red, Green, Blue and Yellow value − Red ⇒ 1 − Green ⇒ 2 − Blue ⇒ 3 − Yellow ⇒ 4 É What’s the problem? − Implies a sort of ranking that doesn?t actually exists in the original data. − The outcome of the mining algorithms would be sensitive to the numeric values we choose to use. (Monash) FIT5196 25 / 43 Data Transformation Nominal to Numeric Transformation Nominal to Numeric Transformation Why? É Many machine learning algorithms only accept numeric value, while in many applications we have nominal attributes. How? É Integer substitution: map each nominal value in the domain to numeric value É Example: assume we have a color attribute with Red, Green, Blue and Yellow value É One-hot encoding (Monash) FIT5196 25 / 43 Data Discretisation Outline 1 Data Transformation 2 Data Discretisation 3 Feature Engineering & Data Sampling 4 Summary (Monash) FIT5196 26 / 43 Data Discretisation Data Discretisation The process of converting or partitioning continuous variables to discretised or nominal variables. É Find concise data representations as categories which are adequate for the learning task retaining as much information in the original continuous attribute as possible É Effects of discretisation − Smooth data − Reduce noisy − Reduce data size − Enable specific methods using nominal data (Monash) FIT5196 27 / 43 Data Discretisation Data Discretisation Methods É Binning É Entropy discretisation É Concept hierarchy (Monash) FIT5196 27 / 43 Data Discretisation Data Discretisation — Binning An unsupervised algorithm (doesn’t care about the dependent variable) that splits ordered data into predefined number of bins. Two approaches É Equal-width binning − Given a range of values, [xmin, xmax ], we divide the value range into intervals with approximately same width, w w = xmax − xmin n where n is the number of bins. Or you can specify the value of w É Equal-depth binning − Divides the range into n intervals, each containing approximately same number of samples. Binning with É mean value É median values É bin boundaries (Monash) FIT5196 28 / 43 Data Discretisation Data Discretisation — Binning Task: discretise {34, 64, 88, 55, 94, 59, 10, 25, 44, 48, 69, 15} É sort the values in ascending order {10, 15, 25, 34, 44, 48, 55, 59, 64, 69, 88, 94} É Equal-width binning with n = 4 {10, 15, 25}, {34, 44, 48}, {55, 59, 64, 69}, {88, 94} − mean value {16.6, 16.6, 16.6}, {42, 42, 42}, {61.75, 61.75, 61.75, 61.75}, {91, 91} − median value {15, 15, 15}, {44, 44, 44}, {61.5, 61.5, 61.5, 61.5}, {91, 91} − boundaries {10, 10, 25}, {34, 48, 48}, {55, 55, 69, 69}, {88, 94} (Monash) FIT5196 29 / 43 Data Discretisation Data Discretisation — Binning Task: discretise {34, 64, 88, 55, 94, 59, 10, 25, 44, 48, 69, 15} É sort the values in ascending order {10, 15, 25, 34, 44, 48, 55, 59, 64, 69, 88, 94} É Equal-depth binning with n = 4 {10, 15, 25}, {34, 44, 48}, {55, 59, 64}, {69, 88, 94} − mean value {16.6, 16.6, 16.6}, {42, 42, 42}, {59.3, 59.3, 59.3}, {83.6, 83.6, 83.6} − median value {15, 15, 15}, {44, 44, 44}, {59, 59, 59}, {88, 88, 88} − boundaries {10, 10, 25}, {34, 48, 48}, {55, 55, 64}, {69, 94, 94} (Monash) FIT5196 29 / 43 Data Discretisation Data Discretisation — Binning Advantage/disadvantage of each method: Equal-width binning É Is simple but sensitive to outliers É Not well handles skewed data Equal-depth binning É Scales well by keeping the distribution of the data (Monash) FIT5196 30 / 43 Data Discretisation Data Discretisation — Entropy Discretisation Entropy H(X ) = − n ∑ i=1 p(xi ) logb(p(xi )) É Coin toss: p(head) = p(tail) = 1/2 H(X ) = −p(head) log2(p(head))−p(tail) log2(p(tail)) = −2× 1 2 log2(1/2) = 1 É Coin toss: p(head) = 0.7 and p(tail) = 0.3 H(X ) = −0.7 log2(0.7)− 0.3 log2(0.3)) = 0.881 < 1 Entropy discretisation: a method takes into account the class labels in discretisation. (Monash) FIT5196 31 / 43 Data Discretisation Data Discretisation — Entropy Discretisation Entropy discretisation: a method takes into account the class labels in discretisation. É Ideas − Data should be split into intervals that maximise the information, measured by Entropy, − Partitioning should not be too fine-grained, to avoid over-fitting. É Algorithm Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/ (Monash) FIT5196 32 / 43 Data Discretisation Data Discretisation — Entropy Discretisation Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/ Entropy of the data: H(X ) = − 3 5 log2( 3 5 )− 2 5 log2( 2 5 ) = 0.529+ 0.442 = 0.971 (Monash) FIT5196 33 / 43 Data Discretisation Data Discretisation — Entropy Discretisation Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/ Split at 4.5 H(X <= 4.5) = − 1 1 log2(1)− 0 1 log2(0) = 0+ 0 = 0 H(X > 4.5) = −
3
4
log2(

3
4
)−

1
4
log2(

1
4
) = 0.311+ 0.5 = 0.811

H(Xnew ) = H(X <= 4.5) + H(X > 4.5) =
1
5
0+

4
5
0.811 = 0.6488

G (Xnew ) = 0.971− 0.6488 = 0.322

(Monash) FIT5196 33 / 43

Data Discretisation

Data Discretisation — Entropy Discretisation

Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/

Split at 6.5

H(X <= 6.5) = − 1 2 log2( 1 2 )− 1 2 log2( 1 2 ) = 1 H(X > 6.5) = −
2
3
log2(

2
3
)−

1
3
log2(

1
3
) = 0.918

H(Xnew ) = H(X <= 6.5) + H(X > 6.5) =
2
5
1+

3
5
0.917 = 0.951

G (Xnew ) = 0.971− 0.951 = 0.02

(Monash) FIT5196 33 / 43

Data Discretisation

Data Discretisation — Entropy Discretisation

Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/

Split at 10

H(X <= 10) = − 1 3 log2( 1 3 )− 2 3 log2( 2 3 ) = 0.918 H(X > 10) = −
2
2
log2(

2
2
)−

0
2
log2(

0
2
) = 0

H(Xnew ) = H(X <= 10) + H(X > 10) =
3
5
0.917+

2
5
0 = 0.551

G (Xnew ) = 0.971− 0.551 = 0.42

(Monash) FIT5196 33 / 43

Data Discretisation

Data Discretisation — Entropy Discretisation

Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/

Split at 13.5

H(X <= 13.5) = − 2 4 log2( 2 4 )− 2 4 log2( 2 4 ) = 1.0 H(X > 13.5) = −
1
1
log2(

1
1
)−

0
1
log2(

0
1
) = 0

H(Xnew ) = H(X <= 13.5) + H(X > 13.5) =
4
5
1.0+

1
5
0 = 0.8

G (Xnew ) = 0.971− 0.8 = 0.171

(Monash) FIT5196 33 / 43

Data Discretisation

Data Discretisation — Entropy Discretisation

Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/

Split at 4.5: G (Xnew ) = 0.322

Split at 6.5: G (Xnew ) = 0.02

Split at 10: G (Xnew ) = 0.42

Split at 13.5: G (Xnew ) = 0.171

(Monash) FIT5196 33 / 43

Data Discretisation

Data Discretisation — Entropy Discretisation

Figure is adapted from http://kevinmeurer.com/a-simple-guide-to-entropy-based-discretization/

When to stop the algorithm
É Terminate when a specified number of bins has been reached
É Terminate when information gain falls below a certain threshold.

(Monash) FIT5196 33 / 43

Data Discretisation

Concept Hierarchy for numerical data

A simple 3-4-5 rule can be used to segment numeric data (attribute values) into
relatively uniform,“natural” intervals.

If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit,
partition the range into 3 equi-width.

If it covers 2, 4, or 8 distinct values at the most significant digit, partition
the range into 4 intervals intervals

If it covers 1, 5, or 10 distinct values at the most significant digit, partition
the range into 5 intervals

(Monash) FIT5196 34 / 43

Data Discretisation

Segmentation by natural partitioning

Figure is from “Data mining: know it all”

(Monash) FIT5196 35 / 43

Feature Engineering & Data Sampling

Outline

1 Data Transformation

2 Data Discretisation

3 Feature Engineering & Data Sampling

4 Summary

(Monash) FIT5196 36 / 43

Feature Engineering & Data Sampling

Feature Engineering

Feature extraction (or generation)
É Generate new features from raw
data or other features

É Goals
− Produce more meaning-

ful/descriptive/discriminant
features

Feature selection
É Select a subset of available
features based on some criteria

É Goals
− Remove irrelevant data
− Increase predictive accuracy of

learned models
− Improve learning efficiency
− Reduce the model complexity

and increase its interpretability

(Monash) FIT5196 37 / 43

Feature Engineering & Data Sampling

Feature Subset Selection

Feature subset selection reduces the data set size by removing irrelevant or
redundant features.

Goal: find a minimum set of attributes such that the resulting probability
distribution of the data classes is as close as possible to the original
distribution obtained using all attributes
Methods
É Stepwise forward selection
É Stepwise backward elimination.
É Combination of forward selection and backward elimination
É Decision tree induction.

(Monash) FIT5196 38 / 43

Feature Engineering & Data Sampling

Feature Subset Selection

Figure is from “Data mining: know it all”

(Monash) FIT5196 39 / 43

Feature Engineering & Data Sampling

Data Sampling Methods

Sampling methods are used to choose a representative subset of the data
Why?
É Reduce the volume of data
É Fix imbalance distribution
É Creating training, validation, testing sets.

(Monash) FIT5196 40 / 43

Feature Engineering & Data Sampling

Data Sampling Methods

Methods: Suppose that a large dataset, D, contains N tuples, the ways we
can used to do data reduction:
É Simple random sample without replacement (SRSWOR) of size s:

− Draw s of the N tuples from D (s < N), where the probability of drawing any tuple in D is 1/N É Simple random sample with replacement (SRSWR) of size s. − Similar to SRWOR, except that after a tuple is drawn, it is placed back in D so that it may be drawn again. Figure is from "Data mining: know it all" (Monash) FIT5196 41 / 43 Feature Engineering & Data Sampling Data Sampling Methods Methods: Suppose that a large dataset, D, contains N tuples, the ways we can used to do data reduction: É Stratified sample: − If D is divided into mutually disjoint parts called strata, a stratified sample of D is generated by obtaining an SRS at each stratum Figure is from "Data mining: know it all" (Monash) FIT5196 42 / 43 Summary Summary Data transformation: É Normalisation/Scaling É Data transformation generating new features É Nominal to numerical transformation Data Discretization Feature selection and data sampling (Monash) FIT5196 43 / 43 Data Transformation Data Normalisation/Scaling Transformation by generating new features Nominal to Numeric Transformation Data Discretisation Feature Engineering & Data Sampling Summary