程序代写代做代考 data mining Mariia Okuneva, M.Sc.

Mariia Okuneva, M.Sc.
Part 1
Data Mining Home Assignment 1
In the 􏰀rst part of this home assignment you will use data on house sales in King County. You will write your own function designed to 􏰀t a linear model and compare its performance with lm() function.
Description of the variables:
• price: price of the house (prediction target) • bedrooms: number of bedrooms
• sqft_living: square footage of the house
• 􏰁oors: total 􏰁oors in house
• grade: overall grade given to the housing unit, based on King County grading system • condition: how good the condition is (overall)
Your task is to write an R script that contains the following parts. But 􏰀rst, download the script template HA1_yournames.R and the CSV-􏰀le house_data.csv from OLAT.
1. Import the data from house_data.csv and save it in a data frame called house.
2. Fit a linear model of the form
price=β0 +β1 ·bedrooms+β2 ·sqft_living+β3 ·􏰁oors+β4 ·grade+β3 ·condition+ε
3. Write a function that computes the least squares estimator. You need to write your own
function, you are not allowed to use any existing function (e.g. lm() ).
Please name your function “OLS”. Your input arguments are x, y, intercept, where x is a matrix of independent variables, y is a dependent variable, and intercept is a boolean variable which indicates whether you want to estimate including an intercept. Default is with intercept (intercept = T ).
The outputs of the function are estimated β parameters, residual sum of squares, R2 as well as adjusted R2.
Additionally, inside the OLS function, please check that x and y are conformable and that there are no missing values (NAs) in the data set.
Make sure that estimated β coe􏰂cients from your function and lm() function coincide. 1

Part 2
Here you will use a maximum likelihood estimator to 􏰀t logistic regression. Your goal is to 􏰀t a logistic regression model in order to predict Direction of the market using Lag1 and Lag2 variables coming from Smarket data set (which is part of the ISLR library).
1. Write a function that calculates −log-likelihood of the data at hand. This function should only depend on a parameter vector β. Again, you need to write your own function, you are not allowed to use any existing function.
2. Use optim and nlm functions to 􏰀nd parameter estimates which maximize log-likelihood.
3. Fit logistic regression with glm library and compare the estimated β coe􏰂cients from tasks 2 and 3.
Remarks: Write comments for everything you do. Codes that are not written using the template and/or that return error messages will not be evaluated. If you are working in groups (not more than 5 students in one group), make sure to note down every participant’s name and ID. Submission: Submit your scripts via email to mokuneva[at]stat-econ.uni-kiel.de until the end of May 27th (until 00:00:00, May 28th)
2