留学生辅导 RESEARCH METHODS FOR INFORMATION PROFESSIOALS

RESEARCH METHODS FOR INFORMATION PROFESSIOALS

Multiple Linear Regression Analysis
Multiple linear regression

Copyright By PowCoder代写 加微信 powcoder

Stepwise regression analysis

Multiple linear regression:
To determine the relationship between 2 or more independent variables (X1, X2, etc.) and a dependent variable (Y)
To develop a model to predict the value of the dependent variable using the values of the independent variables
 

The model is of the form
Y = c + b1 * X1 + b2 * X2 + b3 * X3 + …
“bi” are coefficients of the corresponding independent variable

Multiple linear regression:
Linear regression model
The model is of the form
Y = c + b1 * X1 + b2 * X2 + b3 * X3 + …

weight = c + b1 * height + b2 * gender
weight = -56.8 + 0.7 height + 10 gender
salary = c + b1 * age + b2 * public_library

Multiple linear regression:
When appropriate?
Only when the dependent variable Y is quantitative (i.e. not categorical)
The independent variables Xi can be categorical
Categorical independent variables have to be converted to dummy variables

Multiple linear regression:
Dummy variables 1
Convert categorical independent variables to dummy variables
For binary-valued variables, use 0 and 1
E.g., gender:
0 for male
1 for female

(in this case, male is the “reference” category)

Multiple linear regression:
Dummy variables 2
For a categorical variable with n values,
create n-1 dummy variables
E.g., Race variable with values
Chinese, Malay, Indian, Others

values 3 dummy variables
Race1 Race2 Race3
Chinese 1 0 0
Malay 0 1 0
Indian 0 0 1
Others 0 0 0

Multiple linear regression:
Statistical test for full model
An F-test can be performed to determine whether
the model is significantly more accurate in predicting the value of Y than just taking the average value of Y as the prediction
the independent variables together account for a significant amount of the variation in the values of Y
one or more of the coefficients (“bi”) are significantly greater than 0

Multiple linear regression:
Statistical test for full model
Is full model significantly more accurate than baseline model?
Full model (alternative hypothesis)
weight = -67.6 + 0.8 height – 6.8 gender

Baseline model (null hypothesis)
weight = 59.6 (average weight)
coefficients for height and gender = 0

Multiple linear regression:
Partial statistical test
A partial F-test or a t-test can be performed for the individual coefficients (“bi”) to determine whether:
the particular independent variable contributes significantly to improving the accuracy of the model
bi is significantly greater than 0
the full model (including Xi) is significantly better than a full model minus the Xi variable

Purpose: to decide whether Xi should be dropped or retained in the model.

Multiple linear regression:
Partial statistical test
Is full model significantly more accurate than partial model?
Full model (alternative hypothesis)
weight = -67.6 + 0.8 height – 6.8 gender

Baseline model (null hypothesis)
weight = -129.6 + 1.1 height
coefficient for gender = 0

Multiple linear regression:
Partial statistical test
Is full model significantly more accurate than partial model?
Full model (alternative hypothesis)
weight = -67.6 + – 6.8 gender + 0.8 height

Baseline model (null hypothesis)
weight = 69.6 – 17.3 gender
coefficient for height = 0

Multiple linear regression:
How does it work?
Linear regression analysis will find the equation (i.e. fit a straight line) such that the sum of the squared errors is least. This is called the least squares criterion.

Multiple linear regression:
Modelling curvilinear relationships
To model a curvilinear relationship between Y and Xi:
include in the model, the square of the independent variable: Xi2
E.g.: weight = -54.0 + 0.6 height – 6.8 gender + 0.0005 height2

Various kinds of transformations can be applied to the independent and dependent variables.
 

Multiple linear regression:
Modelling interactions
To investigate the interaction between 2 independent variables X1 and X2:
include in the model, the term X1 * X2
 

What is “interaction”?
There is interaction between 2 independent variables X1 and X2 if the relation between X1 and Y depends on the value of X2.
I.e. for different values of X2, the coefficient of X1 in the model is different.
 

Multiple linear regression:
Modelling interactions
weight = -56.8 + 0.7 height + 10.6 gender – 0.1 height*gender

If gender = 0 (male):
weight = -56.8 + 0.7 height + 10.6 gender

If gender = 1 (female):
= -56.8 + 0.7 height + 10.6 gender – 0.1 height
= -56.8 + 0.6 height + 10.6 gender

Forward stepwise regression:
Overall approach
This approach
constructs a regression model incrementally,
adding to the model, one independent variable at a time until no more variables are significant.

Forward stepwise regression:
Procedure, part 1
Construct a model with just the constant (or intercept): Y = c
Set a significance level for an independent variable to enter the model (say, 0.01), and a significance level for retaining an independent variable in the model (say, 0.05).
Analyse each of the independent variables not already in the model, and check whether the variable would improve the model significantly.
Enter in the model, the significant variable with the smallest p-value (i.e. the “most significant” variable), or the significant variable the makes the most sense according to your theory

Forward stepwise regression:
Procedure, part 2
Examine each of the independent variables in the model and calculate the p-value for their coefficient.
If one of the variables has a p-value above the significance level (0.05), drop the variable from the model. If there are more than 1 variable that are not significant, drop the variable with the highest p-value.
Loop back to step 3 repeatedly until there is no independent variable outside the model that can contribute significantly to the model, and all variables in the model are significant.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com