Machine Learning and Data Mining in Business
Week 7 Conceptual Exercises
Question 1
In this exercise, we’ll apply the theory of multivariate optimisation to the linear regres- sion method. We can derive the same results using matrix calculus.
Copyright By PowCoder代写 加微信 powcoder
In the least squares method for regression, we minimise the cost function
Our objective in this exercises is to show that the solution is β = (X⊤X)−1X⊤y,
1xx…x 1112 1p
Show that y = Xβ.
n p 2
J(β)= yi −β0 −βjxij . i=1 j=1
1×21 x22 …x2p X=,
. . . .. . …..
1 xn1 xn2 … xnp
under the assumption that the columns of X are linearly independent.
(a) Find the partial derivatives of J(β).
(b) Derive the first-order condition.
(c) Let y be an n−dimensional vector of fitted values with entries
yi = β0 + βj xij .
(d) Each equation in part (b) involves an inner product. Using the previous item as a hint, write down what the inner product is.
(e) Write the first-order condition in matrix notation.
(f) Solve the system to obtain the formula for the OLS estimator.
(g) Derive the Hessian of J(β).
(h) We will now show that the cost function is convex. We do this here for completeness,
you are not required to know how to prove this step.
A twice-differentiable function is convex if and only if the Hessian matrix is positive semidefinite. If the Hessian is positive definite, then the function is strictly convex.
We say that a p × p matrix A is positive definite when v⊤Av > 0 for every p- dimensional vector v ̸= 0. If v⊤Av ≥ 0 for every v ̸= 0, then we say that the matrix is positive semidefinite.
Show that ∇2J(β) is positive definite.
(i) Write down the gradient of the cost function in matrix notation.
(j) Suppose that we want to use gradient descent to optimise the cost function. How would we update the parameters at each iteration?
(k) Show that Newton’s method would converge to the solution in one step for this problem, regardless of the starting value. Why is this the case?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com