CS计算机代考程序代写 data mining MAST90083 Computational Statistics & Data Mining Regression Splines

MAST90083 Computational Statistics & Data Mining Regression Splines

Figure 1: Solution of Question 1

rm( l i s t=l s ( ) ) # c l e a r a l l the v a r i a b l e s in conso l e
l i b r a r y ( s p l i n e s )
l i b r a r y (gam)
l i b r a r y ( pracma )
################################################################################
#Question 1 :
n<=50 s e t . seed (5 ) # s e t s the seed f o r random number gene ra t i on making t h e i r r e g ene ra t i on p o s s i b l e e<=rnorm (n , 0 , 0 . 2 ) x<=s o r t ( r un i f (n , 0 , 1 ) ) a <= seq (0 , 1 , l ength= n) y<=cos (2* pi *x)=0.2*x+e b<=cos (2* pi *a)=0.2*a p l o t (x , y ) l i n e s ( a , b ) ################################################################################ #Question 2 : myknots <= quan t i l e (x , probs = c ( 0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 ) ) #ns gene ra t e s a B=s p l i n e ba s i s matrix f o r natura l cub ic s p l i n e s , i n t e r c e p t i s the f i r s t constant term xns<= ns (x , knots = myknots , i n t e r c e p t = TRUE, Boundary . knots = range ( c ( 0 , 1 ) ) ) #y . f i t <= lm(y ˜ =1 + xns ) # command i s used to f i t l i n e a r models y . f i t <= xns%*%pinv ( xns)%*%y p lo t (x , y ) l i n e s ( a , b , c o l = ”dodgerblue ” , l t y = 1) l i n e s (x , y . f i t , c o l = ” f o r e s t g r e e n ” , l t y = 2) myknots <= quan t i l e (x , probs = seq ( 0 . 0 5 , 0 . 9 5 , l ength =8)) #ns gene ra t e s a B=s p l i n e ba s i s matrix f o r natura l cub ic s p l i n e s , i n t e r c e p t i s the f i r s t constant term xns <= ns (x , knots = myknots , i n t e r c e p t = TRUE, Boundary . knots = range ( c ( 0 , 1 ) ) ) #y . f i t <= lm(y ˜ =1 + xns ) # command i s used to f i t l i n e a r models y . f i t <= xns%*%pinv ( xns)%*%y p lo t (x , y ) l i n e s ( a , b , c o l = ”dodgerblue ” , l t y = 1) l i n e s (x , y . f i t , c o l = ” f o r e s t g r e e n ” , l t y = 2) # at around about 8 knots , o v e r f i t t i n g s t a r t s ################################################################################ 1 MAST90083 Computational Statistics & Data Mining Regression Splines Figure 2: Solution of Question 2 #Question 3 : xss <= gam(y ˜ s (x , df = 6) ) y f i t <= p r ed i c t ( xss ) p l o t (x , y ) l i n e s ( a , b , type = ” l ” , c o l = ”dodgerblue3 ” , l t y = 1) l i n e s (x , y f i t , type = ” l ” , c o l = ” f o r e s t g r e e n ” , l t y = 2) ################################################################################ #Question 4 : r e s u l t s <= numeric (15) f o r ( i in 1 : 15 ) { xss <= gam(y ˜ s (x , df = i ) ) y f i t <= p r ed i c t ( xss ) r e s u l t s [ i ] <= sum( ( y f i t = b )ˆ2)/ l ength ( y f i t ) } p lo t ( 2 : 1 5 , r e s u l t s [ 2 : 1 5 ] , type = ”b” , c o l = ”dodgerblue2 ” , xlab = ”DoF” , ylab = ”MSE” , pch = 19 , lwd = 3) df = which . min ( r e s u l t s ) # optimal number found to be at index 7 so df = 7 i s optimal ################################################################################ #Question 5 : data<=read . t ab l e (”D:/R/data . txt ”) #Change the path accord ing to your f i l e l o c a t i o n x<=as . numeric ( data [ 2 : 2 2 2 , 1 ] ) y<=as . numeric ( data [ 2 : 2 2 2 , 2 ] ) xps <= smooth . s p l i n e (x , y , spar =0.9 , a l l . knots = FALSE) y f i t <= p r ed i c t ( xps , x ) $y p lo t (x , y ) l i n e s (x , y f i t , type = ” l ” , c o l = ”dodgerblue3 ” , l t y = 2) # we have to check t h i s manually the o v e r f i t t i n g s t a r t s at around about 0 .5 and und e r f i t t i n g at 1 2 MAST90083 Computational Statistics & Data Mining Regression Splines Figure 3: Solution of Question 2 Figure 4: Solution of Question 3 3 MAST90083 Computational Statistics & Data Mining Regression Splines Figure 5: Solution of Question 4 Figure 6: Solution of Question 5 4