CS代考 Feature Engineering

Feature Engineering
Modify some measurements to make them more useful for classification.
● e.g. translate date to day-of-week 09/08/2019 → Friday
– useful to make predictions about future sales based on past sales

Copyright By PowCoder代写 加微信 powcoder

– not useful to make predictions about chances of death based on date-of-birth
Feature Engineering
Modify some measurements to make them more appropriate for classification.
● e.g. translate categorical features to numbers:
“Monday” → 1 “Tuesday” → 2
“Monday” → [1,0,0,0,0,0,0] “Tuesday” → [0,1,0,0,0,0,0]
● one-hot encoding
Feature engineering can result in dimensionality increase: length(x) ≥ length(m)
● Feature Engineering
● Feature Selection
● Feature Extraction
– PrincipalComponentAnalysis(PCA)
– Whitening
– LinearDiscriminantAnalysis(LDA)
– IndependentComponentAnalysis(ICA) – RandomProjections
– SparseCoding
Measurements (m) Measurements (m)
Features (x) Features (x)

Feature Selection
● Reduces risk of over-fitting
– e.g.iftrainingsetcontained more male than female salmon, this might be exploited by classifier
● Reduces training time
– byreducingcomplexityof
classifier
Feature Selection
Some measurements may not be selected because they are unobtainable or too costly to obtain:
● e.g. the DNA of every fish
Some measurements may not be selected because they are uninformative:
● e.g. gender for classifying fish species
Feature Selection
Some measurements may be more useful for classification than others.
● e.g. for classifying fish species – useful(discriminative)
● size, colour, DNA
– non-usefulfeatures:
● gender, eye shape
Feature selection is the process of choosing the most discriminative subset of data on which to perform the classification process.
Feature selection results in dimensionality reduction: length(m) ≥ length(x)
Measurements (m) Measurements (m) Measurements (m)
Features (x) Features (x) Features (x)

Feature Selection
Feature selection may not be obvious.
● because we do not know if a measurement is useful
– e.g. is “fish scale pointiness” discriminative for classifying fish species?
● because we just have data and don’t know what it means
If domain knowledge cannot help it may be necessary to search for the best features:
● perform classification with different possible subsets of features to find those that produce the least error.
Feature Extraction
Find features that are functions of the raw data.
or selected or engineered features →
Feature extraction usually results in dimensionality reduction: length(m) ≥ length(x)
Feature Selection
Some measurements may not be selected because they are redundant:
● e.g. size and weight might be highly correlated
● including both may provide no extra information, but complicate the classification process unnecessarily
Measurements (m) Measurements (m) Measurements (m)
Features (x) Features (x) Features (x)

Feature Extraction: Example
Suppose we want to project 2D data to a 1D feature space. Let w define the projection from the old to new feature space.
e.g. w=[5 -3]
coordinates in original feature space xT
coordinates in new feature space
Feature Extraction: Example
Suppose we want to project 2D data to a 1D feature space. Let w define the projection from the old to new feature space.
y1=wx1=[5 −3][0]=[0]
coordinates in original feature space xT
coordinates in new feature space
Feature Extraction
Projects original data points into new feature space.
Ideally, new feature space is one in which it is easier to classify data.
feature extraction

Feature Extraction: Example
Suppose we want to project 2D data to a 1D feature space. Let w define the projection from the old to new feature space.
y3=wx3=[5 −3][21]=[7]
coordinates in original feature space xT
coordinates in new feature space
Feature Extraction: Example
Suppose we want to project 2D data to a 1D feature space. Let w define the projection from the old to new feature space.
y4=wx4=[5 −3][01]=[−3]
coordinates in original feature space xT
coordinates in new feature space
Feature Extraction: Example
Suppose we want to project 2D data to a 1D feature space. Let w define the projection from the old to new feature space.
y2=wx2=[5 −3][10]=[5]
coordinates in original feature space xT
coordinates in new feature space

Feature Extraction: Example
● we can apply projection to whole dataset in one go:
[0 5 7−3−1]=
[5 −3][0 1 2 0 1] 00112
Feature Extraction: Example
● scale of new coordinates depends on ||w||
e.g. w=[10 -6]: Y=wX
[0 10 14 −6 −2]=
[10−6][0 1 2 0 1] 00112
Feature Extraction: Example
Suppose we want to project 2D data to a 1D feature space. Let w define the projection from the old to new feature space.
y5=wx5=[5 −3][12]=[−1]
coordinates in original feature space xT
coordinates in new feature space

Feature Extraction: Example
● scale of new coordinates depends on ||w||
● sign of new coordinates depends on sign of w
● a new axis with the same slope but a different origin would produce the same new coordinate values with a fixed shift in value – so would be redundant
Feature Extraction: Example
● w defines the axis of the new space in the original feature space
● each new coordinate is a linear combination of the original coordinates
● same projection applied to all classes
Feature Extraction: Example
● scale of new coordinates depends on ||w||
● sign of new coordinates depends on sign of w
e.g. w=[-10 6]: Y=wX
[0 −10 −14 6 2]=
[−10 6][0 1 2 0 1] 00112

Feature Extraction: Example
● a different w will define a different axis for the new space
● can project to multiple new axes simultaneously:
[0 3 8 2 7]=
0−10−14 6 2
[3 2][0 1 2 0 1] −10 6 0 0 1 1 2
Principal Components Analysis
Feature Extraction: Example
● a different w will define a different axis for the new space
e.g. w=[3 2]: Y=wX
[0 3 8 2 7]=
[3 2][0 1 2 0 1] 00112

Principal Components Analysis (PCA)
Takes N-dimensional data and finds the M (M≤N) orthogonal directions in which the data has the most variance.
Direction in which variance is greatest: 1st principal component
Orthogonal direction in which variance is next greatest: 2nd principal component
Original axes
Principal Components Analysis (PCA)
These M principal directions form a subspace.
We can represent an N-dimensional datapoint by its projection onto the M principal directions
● Feature Engineering
● Feature Selection
● Feature Extraction
– PrincipalComponentAnalysis(PCA)
– Whitening
– LinearDiscriminantAnalysis(LDA)
– IndependentComponentAnalysis(ICA) – RandomProjections
– SparseCoding

Principal Components Analysis (PCA)
e.g. 3-d data projected onto the 1st two principal components
Karhunen-Loève Transform (KLT)
One method of applying PCA: 1 N
● Calculate mean of all data vectors, μ=N ∑xi i=1
● Calculate covariance matrix of zero-mean data 1NT
C=N ∑(xi−μ)(xi−μ) i=1
● Find eigenvalues (E) & eigenvectors (V) of C C=VEVT
● Order eigenvalues from large to small, and discard small eigenvalues and their respective vectors
● Form matrix (V^ ) of remaining eigenvectors and apply to data (with mean subtracted) yi=V^ T (xi−μ)
Principal Components Analysis (PCA)
Typically MCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com