MAST90083 Computational Statistics & Data Mining Regression Splines
Figure 1: Solution of Question 1
rm( l i s t=l s ( ) ) # c l e a r a l l the v a r i a b l e s in conso l e
l i b r a r y ( s p l i n e s )
l i b r a r y (gam)
l i b r a r y ( pracma )
################################################################################
#Question 1 :
n<=50
s e t . seed (5 ) # s e t s the seed f o r random number gene ra t i on making t h e i r r e g ene ra t i on p o s s i b l e
e<=rnorm (n , 0 , 0 . 2 )
x<=s o r t ( r un i f (n , 0 , 1 ) )
a <= seq (0 , 1 , l ength= n)
y<=cos (2* pi *x)=0.2*x+e
b<=cos (2* pi *a)=0.2*a
p l o t (x , y )
l i n e s ( a , b )
################################################################################
#Question 2 :
myknots <= quan t i l e (x , probs = c ( 0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 ) )
#ns gene ra t e s a B=s p l i n e ba s i s matrix f o r natura l cub ic s p l i n e s , i n t e r c e p t i s the f i r s t constant term
xns<= ns (x , knots = myknots , i n t e r c e p t = TRUE, Boundary . knots = range ( c ( 0 , 1 ) ) )
#y . f i t <= lm(y ˜ =1 + xns ) # command i s used to f i t l i n e a r models
y . f i t <= xns%*%pinv ( xns)%*%y
p lo t (x , y )
l i n e s ( a , b , c o l = ”dodgerblue ” , l t y = 1)
l i n e s (x , y . f i t , c o l = ” f o r e s t g r e e n ” , l t y = 2)
myknots <= quan t i l e (x , probs = seq ( 0 . 0 5 , 0 . 9 5 , l ength =8))
#ns gene ra t e s a B=s p l i n e ba s i s matrix f o r natura l cub ic s p l i n e s , i n t e r c e p t i s the f i r s t constant term
xns <= ns (x , knots = myknots , i n t e r c e p t = TRUE, Boundary . knots = range ( c ( 0 , 1 ) ) )
#y . f i t <= lm(y ˜ =1 + xns ) # command i s used to f i t l i n e a r models
y . f i t <= xns%*%pinv ( xns)%*%y
p lo t (x , y )
l i n e s ( a , b , c o l = ”dodgerblue ” , l t y = 1)
l i n e s (x , y . f i t , c o l = ” f o r e s t g r e e n ” , l t y = 2)
# at around about 8 knots , o v e r f i t t i n g s t a r t s
################################################################################
1
MAST90083 Computational Statistics & Data Mining Regression Splines
Figure 2: Solution of Question 2
#Question 3 :
xss <= gam(y ˜ s (x , df = 6) )
y f i t <= p r ed i c t ( xss )
p l o t (x , y )
l i n e s ( a , b , type = ” l ” , c o l = ”dodgerblue3 ” , l t y = 1)
l i n e s (x , y f i t , type = ” l ” , c o l = ” f o r e s t g r e e n ” , l t y = 2)
################################################################################
#Question 4 :
r e s u l t s <= numeric (15)
f o r ( i in 1 : 15 ) {
xss <= gam(y ˜ s (x , df = i ) )
y f i t <= p r ed i c t ( xss )
r e s u l t s [ i ] <= sum( ( y f i t = b )ˆ2)/ l ength ( y f i t )
}
p lo t ( 2 : 1 5 , r e s u l t s [ 2 : 1 5 ] , type = ”b” , c o l = ”dodgerblue2 ” , xlab = ”DoF” , ylab = ”MSE” , pch = 19 , lwd = 3)
df = which . min ( r e s u l t s )
# optimal number found to be at index 7 so df = 7 i s optimal
################################################################################
#Question 5 :
data<=read . t ab l e (”D:/R/data . txt ”) #Change the path accord ing to your f i l e l o c a t i o n
x<=as . numeric ( data [ 2 : 2 2 2 , 1 ] )
y<=as . numeric ( data [ 2 : 2 2 2 , 2 ] )
xps <= smooth . s p l i n e (x , y , spar =0.9 , a l l . knots = FALSE)
y f i t <= p r ed i c t ( xps , x ) $y
p lo t (x , y )
l i n e s (x , y f i t , type = ” l ” , c o l = ”dodgerblue3 ” , l t y = 2)
# we have to check t h i s manually the o v e r f i t t i n g s t a r t s at around about 0 .5 and und e r f i t t i n g at 1
2
MAST90083 Computational Statistics & Data Mining Regression Splines
Figure 3: Solution of Question 2
Figure 4: Solution of Question 3
3
MAST90083 Computational Statistics & Data Mining Regression Splines
Figure 5: Solution of Question 4
Figure 6: Solution of Question 5
4