程序代写 SIDW299104 SIDW380102 SID73161

Statistical Machine Learning
Kamiar Rahnama Rad

Figures are ISLR and ESL.

Copyright By PowCoder代写 加微信 powcoder

• statistical learning:
– supervised: the presence of the outcome variable guides
the learning process, such as regression and classification.
– unsupervised: we observe only the features and have no measurements of the outcome, such as clustering and dimensionality reduction.

4 à Introduction
FIGURE à2à Examples of handwritten digits from UàSà postal envelopesà
prostate specific antigen (PSA) and a number of clinical measures, in 97 men who were about to receive a radical prostatectomy.
The goal is to predict the log of PSA (lpsa) from a number of measure-

• supervised learning/classification: identify the numbers in a handwritten zip code, from a digitized image.
• each image is a segment from a five digit zip code.
• images are 16 × 16 eight-bit greyscale maps.
• each pixel ranging in intensity from 0 to 255.
• predict, from the 16 ×16 matrix of pixel intensities, the identity of each image (0, . . . , 9).

6 à Introduction
SIDW299104 SIDW380102 SID73161
GNAL H.sapiensmRNA SID325394 RASGTPASE SID207172
SIDW377402 HumanmRNA SIDW469884
SID471915 MYBPROTO ESTsChr.1 SID377451 DNAPOLYMER SID375812 SIDW31489 SID167117 SIDW470459 SIDW487261 Homosapiens SIDW376586
Chr MITOCHONDRIAL60 SID47116 ESTsChr.6 SIDW296310 SID488017 SID305167 ESTsChr.3 SID127504 SID289414
PTPRC SIDW298203 SIDW310141 SIDW376928 ESTsCh31 SID114241 SID377419 SID297117 SIDW201620 SIDW279664 SIDW510534 HLACLASSI SIDW203464 SID239012 SIDW205716 SIDW376776 HYPOTHETICAL WASWiskott SIDW321854 ESTsChr.15 SIDW376394 SID280066 ESTsChr.5 SIDW488221 SID46536 SIDW257915 ESTsChr.2 SIDW322806 SID200394 ESTsChr.15 SID284853 SID485148 SID297905 ESTs SIDW486740 SMALLNUC ESTs SIDW366311 SIDW357197 SID52979
ESTs SID43609 SIDW416621 ERLUMEN TUPLE1TUP1 SIDW428642 SID381079 SIDW298052 SIDW417270 SIDW362471 ESTsChr.15 SIDW321925 SID380265 SIDW308182 SID381508 SID377133 SIDW365099 ESTsChr.10 SIDW325120 SID360097 SID375990 SIDW128368 SID301902 SID31984 SID42354
FIGURE à3à DNA microarray data: expression matrix of 683à genes rows and 64 samples columns , for the human tumor dataà Only a random sample of àà rows are shownà The display is a heat map, ranging from bright green
negative, under expressed to bright red positive, over expressed à Missing values are grayà The rows and columns are displayed in a randomly chosen orderà
BREAST RENAL MELANOMA MELANOMA MCF7D-repro COLON COLON K562B-repro COLON NSCLC LEUKEMIA RENAL MELANOMA BREAST CNS CNS RENAL MCF7A-repro NSCLC K562A-repro COLON CNS NSCLC NSCLC LEUKEMIA CNS OVARIAN BREAST LEUKEMIA MELANOMA MELANOMA OVARIAN OVARIAN NSCLC RENAL BREAST MELANOMA OVARIAN OVARIAN NSCLC RENAL BREAST MELANOMA LEUKEMIA COLON BREAST LEUKEMIA COLON CNS MELANOMA NSCLC PROSTATE NSCLC RENAL RENAL NSCLC RENAL LEUKEMIA OVARIAN PROSTATE COLON BREAST RENAL UNKNOWN

• 6830 genes (rows).
• 64 samples (columns) corresponding to cancer tumors from
different patients.
• which samples are most similar to each other, in terms of their expression profile across genes? think of samples as points in 6830 dimensional space, which we want to cluster in some way.
• which genes are most similar to each other, in terms of their expression profiles across samples?

2 à Introduction
TABLE àà Average percentage of words or characters in an email message equal to the indicated word or characterà We have chosen the words and characters showing the largest difference between spam and emailà
george you your hp free hpl our re edu remove
spam 0.00 2.26 1.38 0.02 0.52 0.01 0.51 0.51 0.13 0.01 0.28 email 1.27 1.27 0.44 0.90 0.07 0.43 0.11 0.18 0.42 0.29 0.01
measurements for a set of objects (such as people). Using this data we build a prediction model, or learner, which will enable us to predict the outcome for new unseen objects. A good learner is one that accurately predicts such an outcome.
The examples above describe what is called the supervised learning prob- lem. It is called “supervised” because of the presence of the outcome vari- able to guide the learning process. In the unsupervised learning problem, we observe only the features and have no measurements of the outcome.

• supervised learning/classification: predict wether an email is spam based on the words and punctuation marks in the email message.
• not all errors are equal: we want to avoid filtering out good email, while letting spam get through is not desirable but less serious in its consequences.

6 2à Statistical Learning
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 TV Radio Newspaper
FIGURE 2àà The Advertising data setà The plot displays sales, in thousands of units, as a function of TV, radio, and newspaper budgets, in thousands of dollars, for 2àà different marketsà In each plot we show the simple least squares Þt of sales to that variable, as described in Chapter 3à In other words, each blue line represents a simple model that can be used to predict sales using TV, radio, and newspaper, respectivelyà
More generally, suppose that we observe a quantitative response Y and p different predictors, X, X2, . . . , Xp. We assume that there is some relationship between Y and X = (X,X2,…,Xp), which can be written in the very general form
Sales 5 10 15
Sales 5 10 15
Sales 5 10 15

2à What Is Statistical Learning à
10 12 14 16 18 20 22 10 12 14 16 18 20 22 Years of Education Years of Education
FIGURE 2à2à The Income data setà Left: The red dots are the observed values of income in tens of thousands of dollars and years of education for 3à indi- vidualsà Right: The blue curve represents the true underlying relationship between income and years of education, which is generally unknown but is known in this case because the data were simulated à The black lines represent the error associated with each observationà Note that some errors are positive if an ob- servation lies above the blue curve and some are negative if an observation lies below the curve à Overall, these errors have approximately mean zeroà
In essence, statistical learning refers to a set of approaches for estimating
20 30 40 50 60 70 80
20 30 40 50 60 70 80

8 2à Statistical Learning
FIGURE 2à3à The plot displays income as a function of years of education and seniority in the Income data setà The blue surface represents the true un- derlying relationship between income and years of education and seniority, which is known since the data are simulatedà The red dots indicate the observed values of these quantities for 3à individualsà
Years of Education

22 2à Statistical Learning
FIGURE 2à4à A linear model Þt by least squares to the Income data from Fig- ure 2à3à The observations are shown in red, and the yellow plane indicates the least squares Þt to the dataà
parameters. Assuming a parametric form for f simplifies the problem of
Years of Education
estimating f because it is generally much easier to estimate a set of pa-

• there is some relationship between the p features (predictors, independent variables, input variables) X = (X1, . . . , Xp) and the response (dependent variable, output) Y
Y =f(X)+ε.
• ε is a random error term, independent of X, and has zero
• f(.) represents the systematic information that X provides
• f(.) is unknown but we want to learn/estimate it from data.

• Why estimate f? ˆˆ
– prediction: Y = f(X).
– accuracy of Yˆ as a prediction for Y depends on:
∗ reducible error: fˆ is not a perfect estimate of f due to limited amount of data, and an inaccurate statistical model, e.g. linear regression when the relationship is nonlinear.
∗ irreducible error: even if we have a perfect estimate, that is Yˆ = f(X), our prediction will still have some error because Y is also a function of ε which cannot be predicted from X.

• EY (Y − Yˆ )2: average squared difference between the predicted and the actual value of Y :
EY (Y −Y) = [f(X)−f(X)] +var(ε)
ˆ2 • reducible error: [f(X) − f(X)]
• irreducible error: var(ε)
• take home message: We must not expect more precision than the subject-matter admits. Nicomachean Ethics.

2à What Is Statistical Learning 23
FIGURE 2à5à A smooth thin-plate spline Þt to the Income data from Figure 2à3 is shown in yellowà the observations are displayed in redà Splines are discussed in Chapter àà
slightly less positive relationship between seniority and income. It may be that with such a small number of observations, this is the best we can do.
Years of Education

24 2à Statistical Learning
FIGURE 2à6à A rough thin-plate spline Þt to the Income data from Figure 2à3à This Þt makes zero errors on the training dataà
smooth. In this case, the non-parametric fit has produced a remarkably ac- curate estimate of the true f shown in Figure 2.3. In order to fit a thin-plate
Years of Education

• why estimate f? – inference:
∗ which predictors are associated with the response?
∗ what is the relationship between the response and each
predictor?
∗ can the relationship between Y and each predictor be
adequately summarized using a linear equation, or is the relationship more complicated?

2à What Is Statistical Learning 25
Subset Selection Squares
Generalized Additive Models Trees
Bagging, Boosting Support Vector Machines
Flexibility
FIGURE 2ààà A representation of the tradeoff between ßexibility and inter- pretability, using different statistical learning methodsà In general, as the ßexibil- ity of a method increases, its interpretability decreasesà
Other methods, such as the thin plate splines shown in Figures 2.5 and 2.6, are considerably more flexible because they can generate a much wider
Interpretability

2à What Is Statistical Learning 2à
0 2 4 6 8 10 12 0 2 4 6
FIGURE 2à8à A clustering data set involving three groupsà Each group is shown using a different colored symbolà Left: The three groups are well-separatedà In this setting, a clustering approach should successfully identify the three groupsà Right: There is some overlap among the groupsà Now the clustering task is more challengingà
possible? We can seek to understand the relationships between the variables
2 4 6 8 10 12
or between the observations. One statistical learning tool that we may use

• training data is 􏰎(x1, y1), (x2, y2), . . . , (xn, yn)􏰏 where xi = (xi1,xi2,…,xip)T.
• or equivalently in vector and matrix format the training data is X, y where
– X isan×pmatrixwithxi =(xi1,xi2,…,xip)T asitsith row
– y = (y1,y2,…,yn)T.
• example of supervised learning: given data X,y find f(.)
such that f(xi) is close to yi.

• matrix algebra and notation…
• y = Xβ + ε 􏰋n T ˆ 2 2 • residual sum of squares = i=1(yi − xi β) = ∥y − Xβ∥2 • what is n? what is p?

• how to assess model accuracy? train vs. test.
• training MSE: 􏰌n
1ˆ2 MSE = n (yi − f(xi))
ˆ Ave(y0 − f(x0)),
• accuracy of predictions when f() applied to previously unseen data.
• test MSE:
where (x0,y0) is a previously unseen test observation not used to train the statistical learning method.

• typically, in practice we use all our data to train/learn/estimate the statistical model…cross validation to estimate test MSE (ISLR chapter 5)
• train MSE ̸= test MSE
• train MSE vs. degrees of freedom ?
• test MSE vs. degrees of freedom?
• degrees of freedom ≃ a quantity that summarizes the flexibility ≃ a measure for model complexity ≃ number of parameters in the model
• overfitting: a less flexible model would have yielded a smaller test MSE

2à2 Assessing Model Accuracy 3
0 20 40 60 80 100 2 5 10 20 X Flexibility
FIGURE 2à9à Left: Data simulated from f, shown in blackà Three estimates of f are shown: the linear regression line orange curve , and two smoothing spline Þts blue and green curves à Right: Training MSE grey curve , test MSE red curve , and minimum possible test MSE over all methods dashed line à Squares represent the training and test MSEs for the three Þts shown in the left-hand panelà
2 4 6 8 10 12
Mean Squared Error
0.0 0.5 1.0 1.5 2.0 2.5

• u-shape of test MSE: bias-variance trade-off 􏰅ˆ􏰆2 ˆ ˆ2
• E y0 − f(x0) = var(f(x0)) + [bias(f(x0))] + var(ε)
• expected test error: repeatedly estimate f using a large
number of training sets, and test each estimate at x0
• ideal: low bias and low variance
• variance: the amount by which fˆ would change if we estimated using a different training data set.
• high variance: small changes in the training data results in ˆ
large changes in f.
• bias: error that is introduced by approximating an extremely complicated real life problem by a much simpler model.

• more flexible methods → small bias + high variance.
• low bias + high variance = curve that passes through every
single training observaion
• low variance + high bias = fitting a horizontal line to the data

• classification: how to assessing model accuracy?
• basic concepts: error rate, indicator variable, training error,
test error
• conditional probability: Pr(Y = j|X = x0)
• bayes classifier=unattainable gold standard ∼ irreducible error
• bayes decision boundary 􏰅 􏰆
• overall bayes error rate: 1 − E maxj Pr(Y = j|X = x0)

• k􏰍-nearest neighbors (knn􏰋):
Pr(Y =j|X =x0)= 1 I(yi =j) K i∈N0

2à3 Least Squares and Nearest Neighbors 3
Linear Regression of 0/1 Response
…………………………………………………………… …………………………………………………………… . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ……………………………………………………………
o ……………………………………………………………
. . . . . .o. . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . …………………………………………………………… ……………………………………………………………
. . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
oo …………………………………………………………… ……………………………………………………………
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .o. . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . o. . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ……………………………………………………………
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
……………………………………………………………
ooo ……………………………………………………………
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . o. . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . .o. . . . . . . . . . . . . . . . . . ……………………………………………………………
. . . . . . . . . . . . . . o. . . . . . . . . . . . . . .o. . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . o. .o. . . . . . . . . . . o. . . . . . . o. . . . o. . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o o. o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
oo . . . . . . . . . . . . . . . . . . . . . . o o. . . . . . . . . . . . . . o. . o. . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . o. . . . . . . . . . . . . o. . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . o. . . . o. . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .o. . o. . . . . . . . . . o. . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . .o. . . . . . o. . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o.o. . . . .o. . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . o. . . . . .o. . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .o. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o. . . . . . .
ooo ……………………………………………………………
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com