Microsoft Word – QBUS6850_ass01_StudentVersion.docx
版权所有@顶点补习 1
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
QBUS 6850 Machine Learning in Business
# 这节课以讲解第一个 individual Assignment 为目的,稍带复习题目相关知识点
# 更多课程知识点梳理和总结请关注期中复习课
# coding时遇到解决不了的报错,善用 Stack Overflow 和官网 documents
# 为防代码一致,请改一下自定义的变量名字,图表名称,以及某些语句
Requirements:
1. Value: 10%
2. Due date: 03/09/2018 (Mon), 17:00
3. Submit WORD document with full explanation and interpretation of any results you obtain
4. Python code in appendix
5. 3 decimal point
6. 10 page limit exluding appendix
# import basic packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Question-1 Linear Regression (50 marks; week2)
You will work on the UCI ML housing dataset
# for q(a)
from sklearn.datasets import load_boston
from mpl_toolkits.mplot3d import Axes3D
# for q(b)&(C)
from sklearn.linear_model import LinearRegression as lr
from sklearn.cross_validation import train_test_split
a) Suppose you are interested in using the house age AGE (proportion of owner- occupied units
built prior to 1940) as the first feature x1 and the full-value property-tax rate TAX as the
second feature x2 to predict the MEDV (median value of owner-occupied homes in $1000’s)
as the target t. Write code to extract these two features and the target from the dataset. Use
the dataset (two chosen features and one target) to plot the loss function
That is, we are using a linear regression model without the intercept term β0.
Hint: This is a 3D plot and you will need to iterate over a range of β1 and β2 values.
版权所有@顶点补习 2
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
a1. Concept
Loss function
to measure the error between the observed and the model. Loss function, also called a cost
function, which is a single, overall measure of loss incurred in taking any of the available
decisions or actions.
to find the model parameters, we would use gradient descent.
1. We have some random starting point for β1;
2. Keep updating β1 to decrease the loss function L(β1) value;
3. Repeat until achieving minimum (convergence).
4. α (> 0) is called the learning rate: in empirical study, we can try many α values, and
select the one generates least L(β1)
5. gradient descent can converge to a local minima
a2. Code
“””
3D plot
“””
fig = plt.figure()
ax = fig.gca(projection=’3d’)
surf = ax.plot_surface(X, Y, Z,linewidth=0)
版权所有@顶点补习 3
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
plt.title(‘Loss function’)
ax.set_xlabel(‘beta_0’)
ax.set_ylabel(‘beta_1’)
ax.set_zlabel(‘Loss Function’)
plt.show()
b) Use the linear regression model LinearRegression in the scikit-learn package to do
two linear regression models to predict the target, with and without the intercept term. You
may use 90% of the data as your training data, and the remaining 10% as your testing data.
Compare the performance of two models and explain the importance of the intercept term.
Hint: The argument fit_intercept of the LinearRegression controls whether an
intercept term is included in the model by fit_intercept = True or
fit_intercept = False.
b2. Code
# Create the linear regression object
lr_obj = LinearRegression()
# Estiamte coefficients
lr_obj.fit(x_data, y_data)
print(“\nThe estimated model parameters are”)
print(lr_obj.intercept_[0]) # Intercept or \beta_0
print(lr_obj.coef_[0,0]) # \beta_1
c) Take 90% of data as training data. Construct the centred training dataset by conducting the
following steps in your Python code:
(i) Take the mean of all the training target values, then deduct this mean from
each training
target value MEDV. Take the resulting target values as the new
training target values
𝐭new
(ii) In the training data, take the mean of all the first feature values AGE, then
deduct this
mean from each of first feature values. Take the result as the new first feature values x1new
(iii) In the training data, do the same for the second feature TAX. The result is x2new;
Now build linear regressions with and without the intercept to fit to the new training data.
Report and compare the coefficients and the intercept. Compare the performance of two
models over the testing data. Note that, when you take your testing data into the model to
calculate performance scores, you shall take the relevant training means from the testing
features and targets.
版权所有@顶点补习 4
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
d) Consider the closed-form solution of the linear regression below,
see slide 25 (the number may change) of Lecture 2
where X is the design (data) matrix whose first column is all 1s, and the first component in β
is the intercept. Suppose that the data are centred (refer to (c)). Now prove that, in the case of
centred data, the intercept β0 in the solution above is zero.
Hint: You may need that following fact that
where both matrices A and B are invertible.
d1. normal equation
除了 gradient descent,还有一种更简单粗暴的方法,就是直接解方程。表达式如下,注意
每个矩阵的大小。X要加 1列,因为我们要设置一列 1向量给 β0用:
比较一下 normal equation相对于 gradient descent的优缺点:
版权所有@顶点补习 5
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
Question-2 Logistic Regression (50 marks; week 3)
# for q(a)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report
Use Logistic Regression to predict diagnosis of breast cancer patients on the Breast Cancer
Wisconsin (Diagnostic) Dataset (wdbc.data). See Section About Datasets. This question aims
to test your ability in programming in matrix operation for Logistic Regression.
a) Write Python code to load the data into your program. For the target feature Diagnosis, change
its literal M (malignant) to 0 and B (benign) to 1. Split the data into training and validation
sets (80%, 20% split). Then define and train a logistic regression model by using scikit-learn’s
LogisticRegression model.
a1. Concept
• logistic function (sigmoid) + regression
• a classification method
• generalized linear model
a2. Code
# logistic regression
lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_val)
版权所有@顶点补习 6
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
b) Using the logistic regression model function below and the
estimated parameters from your model, calculate the probability of sample ID 8510426 (20th
sample)
having a benign diagnosis.
c) The objective of logistic regression is defined as, on slide 17 (the number may change) of
Lecture 3,
where both the parameter β = (β1, β2, …, βd)
T and sample xn = (xn0, xn1, …, xnd)
T are d+1
dimensional vectors, where the intercept feature xn0 = 1. For Wisconsin Dataset d = 30. It is
easy to prove that (you don’t need to prove this)
Write your own python code to use this derivative formula to implement the gradient descent
algorithm for the logistic regression. You may write a python function named such as
myLogisticGD, which accepts an data matrix X, an initial parameter beta_0, and a number
of GD iterations T and other arguments you see appropriate. Your function should return the
learned parameter β.
Hint: In python, you can use the following way to get the vector . First define the
sigmoid function by
def sigmoid(x):
return (1 / (1 + np.exp(-x)))
then
F = sigmoid(np.multiply(X, beta))
c1. Concept
c2. Code
“””
Build the gradient descent function
“””
版权所有@顶点补习 7
Level 7, 263 Clarence St, Sydney, NSW 2000
Tel: 02 9696 7007
E-mail: admin@dingdian.com.au
公众平台 学习助手
# m denotes the number of training examples here
, not the number of features
def Gradient_Descent_Algo(x, y, beta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
# predicted values from the model
model_0 = np.dot(x, beta)
loss_temp = model_0 – y
# calculte the loss function
loss = np.sum(np.square(loss_temp)) / (2 * m)
# save all the loss function values at each step
loss_total[i]= loss
print(“Iteration: {0} | Loss fucntion: {1}”.format(i, loss))
# calcualte the gradient using matrix representation
gradient = np.dot(xTrans, loss_temp) / m
# update the parameters simulteneously with learning rate
beta = beta – alpha * gradient
# save all the estimated parametes at each step
beta_total[i,:]= beta.transpose()
return beta
d) Based on task (c) and the training data used in (a), write python code to use different initial
values β = (0, 0,…,0)T, β = (1, 1,…,1)T, and a random initial β to start the gradient descent
algorithm to minimise the objective of logistic regression with respect to the parameter β. You
set the number of iteration T=200. Use each resulting β to re-do task (b). Compare the results
and explain the major reasons why you may have different answers with different initial value
for β.