Untitled

10/8/2019
Mohammed J. Zaki | Dmcourse / Assign2
Recent Changes – Search:
Main Page
Syllabus
Campuswire
Assign2
Assign1Assignment 2: Linear Regression
Assign2
edit SideBarDue Date: Sat 12th Oct, Before Midnight
In this assignment you will implement the linear regression algorithm via QR factorization, namely Algorithm 23.1 on page 602 in Chapter 23 (see the chapters posted on campuswire post#70).
You must implement QR factorization on your own, as described in Section 23.3.1 (you cannot use numpy.linalg.qr or similar function).
Next, using the and the matrices, you must solve for the augmented weight vector using backsubstitution (again, you cannot use any numpy builtin functions related to backsolve; you must write your own function). See Example 23.4 on how backsolve works on the Attach:iris- plwsl.dat dataset.
After you have computed the weight vector , you should compute the SSE value for the predictions, and also the statistic, defined as:
where TSS is the total scatter of the response variable
The statistic measures the amount of variation explained by the model, so the higher the value the better (largest value being 1).
(CSCI4390)
View Edit History Print
Dmcourse /
Submitty
Assignments
Implement algorithm 23.1 as described.
(CSCI6390)
First implement algorithm 23.1, however, you will extend the algorithm to do QR factorization with regularization, i.e., you will perform ridge regression via the QR factorization. See Attach:qr-ridge.pdf on instructions on how to extend algorithm 23.1 to handle or ridge regression.
See below for the output format.
What to submit
https://www.cs.rpi.edu/~zaki/www-new/pmwiki.php/Dmcourse/Assign2 1/2
2L
wRQ
w
oG
2R
SST = 2R ESS−SS T
2) Yμ − iy(1=i∑ = SS T n
2R

10/8/2019 Mohammed J. Zaki | Dmcourse / Assign2
Write a python script named assign2.py
CSCI4390: The script will be run as assign2-part1.py TRAIN TEST.
CSCI6390: The script will be run as assign2-part1.py TRAIN TEST RIDGE
where TRAIN and TEST are the training and testing file names (assume they are in the local directory), and RIDGE is the value for the regularization constant.
Your script MUST print the following: 1. the weight vector
2. the 3. the 4. the
norm of the weight vector
and values on the training data and values on the testing data
CSCI6390: In addition to the above, you must try different values of the RIDGE constant and try to determine which variables are the most important.
Show your output on the following dataset: Train and Test
This dataset is from the UCI machine learning repository on Beijing Air Quality. I have already extracted the relevant features in the files above. The dataset has the following fields: “PM10″,”SO2″,”NO2″,”CO”,”O3″,”TEMP”,”PRES”,”DEWP”,”RAIN”,”WSPM”,”hour”,”PM2.5″ So here “PM2.5” is the response variable.
The last field is the dependent/response variable, and the other fields are independent variables.
You may also test your code on the iris dataset. The results should match those shown in Example 23.4. You can use the same file for both train and test in this case. Again assume that the last column is the response variable.
Submit both the script assign2.py and the output assign2.txt via submitty:
https://submitty.cs.rpi.edu/f19/csci4390
Edit – History – Print – Recent Changes – Search
Page last modified on October 05, 2019, at 11:06 AM
https://www.cs.rpi.edu/~zaki/www-new/pmwiki.php/Dmcourse/Assign2
2/2
w
2R ESS 2R ESS 2L