程序代写 MAST90138 Week 5 Lab

MAST90138 Week 5 Lab
Problems:
The iris data contain various measurements (sepal length, sepal width, petal length and petal width) of 50 flowers from each of 3 species of iris flowers. Type help(iris) to learn about the formal of these data.
1. Load the iris data in R (they are already in R). Solution:
data(iris)
2. Do a PC analysis of these data using only the numerical variables, this time using the prcomp command. Using the output of this function, store the eigenvectors of the covariance matrix S in a matrix and the eigenvalues in a vector. Also store the Yik’s in a matrix Y, again using the output of prcomp.
Solution:
PCX=prcomp(iris[,1:4],retx=T)
vec=PCX$rotation
lambda=PCX$sdev^2
Y=PCX$x
3. Draw a screeplot for these data and recall the ψj’s (cumulative proportion of variance explained by each component) from last week. How many components does this suggest you should keep?
Solution:
screeplot(PCX)
cumsum(lambda)/sum(lambda)
The first two PCs together explain 98% of the variability of the data and the screeplot confirms as quick and sharp decrease of the λk’s. This suggests that the first two PCs capture a large fraction of the variability of the original data and that just with these two we may be able to uncover interesting features about the data.
4. What is the weight of each original variable in the linear combination use to create PC1 and PC2? Which variables are the most correlated with each PC (describe PC by PC and support your answer by some calculations)?
Solution:
vec
PC1 PC2 PC3 PC4
Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872
Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390
Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574
1

PC1 puts weight 0.3613866, -0.08452251, 0.85667061, 0.35828920 on, respectively, the sepal length, the sepal width, the petal length and the petal width. PC2 puts weights -0.6565888, -0.73016143, 0.17337266, 0.07548102 on, respectively, the sepal length, the sepal width, the petal length and the petal width. PC3 puts weights 0.5820299, -0.59791083, -0.07623608, – 0.54583143 on, respectively, the sepal length, the sepal width, the petal length and the petal width. PC4 -puts weights 0.3154872, -0.31972310, -0.47983899, 0.75365743 on, respectively, the sepal length, the sepal width, the petal length and the petal width.
PC1 puts the most weight on the petal length and also some weight on the sepal length and the petal width; all contribute positively to PC1. PC2 puts the most weight on the sepal length and the sepal width, which have a negative effect on PC2. PC3 put most of its weight on all but the petal length and PC4 puts most of its weight on the petal width.
5. The correlation graph showing the correlation between each of the original variable and two PCs is given below. We also provide a table with the values of the correlations between each original variable and each PC. Use this graph and this table to provide more insight into the results of the analysis.
Table 1: Correlations between original variables and the principal components:
Sepal length Sepal width Petal length Petal width
PC1 0.8974018 -0.3987485 0.9978739 0.9665475
PC2 -0.3906044 -0.8252287 0.0483806 0.0487816
PC3 0.19656672 -0.38363030 -0.01207737 -0.20026170
PC4 0.05882002 -0.11324764 -0.04196487 0.15264831
correlations between the Xj’s and PC1 and PC2
Petal.WLeindgthth
Sepal.Length
Sepal.Width
−2 −1 0 1 2
PC1
2
PC2
−1.0 −0.5 0.0 0.5 1.0

Solution:
**
***
* ****** ***
*** ***
* **
* *
* ** ** *** **
** * * **
* * * *
** ****
* *****
* * ***
* setosa
* versicolor * virginica
* **
*
***
*
* ** **
***** ***** ***
** *******
* ****
** * *** *
*** * **** *****
** *** **
***** * *
* **
*
*
−3 −2 −1 0 1 2 3 4
PC1
All four variables are close to the circle of radius 1, which indicates that they are strongly correlated with the first two PCs. We also know that together the first two PCs explain a large fraction of the variability of the data, so the direction of the arrows can be used in conjunction with the scatterplot of the first two PCs to learn the effect of those three original variables on the individuals. In particular, it appears that the setosa tend to be very different from the versicolor and the virginica: they tend to have a larger sepal width than these two. The versicolor and the virginica tend to have larger values of petal width and length and of sepal length. The virginica tend to have larger values of petal width and length than the versicolor.
Going back to the original data using pairs(X,col=c(2,3,4)[class]), we can see that indeed this is the case.
3
PC2
−1.0 −0.5 0.0 0.5 1.0