CS计算机代考程序代写 algorithm python AI Instruction

Instruction
CMPSC 448: Machine Learning and AI
Homework 5 (Due Sunday May 2, 11:59 PM)
This HW includes both theory and implementation problems. Please note,
• Your code must work with Python 3.5+ (you may install the Anaconda distribution of Python)
• You need to submit a report in PDF including all written deliverable, and all implementation codes so one could regenerate your results.
• For programming part of PCA problem, you can import any module you would like from scikit-learn or other external libraries to minimize the amount of implementation required to get the coding problem done. Submit your code in a Jupyter notebook named PCA.ipynb.
The submission for this homework should be a single Jupyter notebook file containing all of the relevant code, figures, and any text explaining your results in Canvas and a single PDF file containing the solutions via Gradescope (please include all the figures/results of Problem 2 in the PDF file for grading purposes).
PCA
Problem 1. As we discussed in the lecture, the PCA algorithm is sensitive to scaling and pre-processing of data. In this problem, we explore few data pre-processing operations on data matrices.
a) Explain what does each of the following data processing operation mean, why it is important, and how you apply to a data matrix X ∈ Rn×d, n samples each with d features 1 (if it helps, use mathematical equations to show the calculations).
(a) Centering or mean removal
(b) Scaling of a feature to a range [a, b]
(c) Standardization (feature level) (d) Normalization (sample level)
b) Apply the above operations separately to the following data matrix with n = 5 samples and d = 2 features and show the processed data matrix. For scaling pick a = 0 and b = 1.
1 2
−1 1 X=0 1 2 4
31
1For an implementation of these operation in scikit-learn please see this. 1

Problem 2. In this assignment, you will explore PCA as a technique for discerning whether low-dimensional structure exists in a set of data and for finding good representations of the data in that subspace. To this end, you will do PCA on Iris dataset which can be loaded in scikit-learn using following commands:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
a) Carry out a principal component analysis on the entire raw dataset (4-dimensional in- stances) for k = 1, 2, 3, 4 components. How much of variance in data can be explained by the first principal component? How does the fraction of variation explained in data vary as k varies?
b) Apply the preprocessing operations from Problem 1 on raw data set and repeat part (a) on processed data. Explain any differences you observe compared to part (a) and justify.
c) Project the raw four dimensional data down to a two dimensional subspace generated by first two top principle components (PCs) from part (b) and produce a scatter plot of the data. Make sure you plot each of the three classes differently (using color or different markers). Can you see the three Iris flower clusters?
d) Either use your k-means++ implementation from previous homework or from scikit-learn to cluster data from part (c) into three clusters. Explain your observations.
Matrix Factorization
Problem 3. Recall the following objective for extracting latent features from a partially observed rating matrix via matrix factorization (MF) for making recommendations, discussed in the class:
nm minf(U,V)≡ 􏰀 (Ri,j −u⊤i vj)2 +α􏰀∥ui∥2 +β􏰀∥vj∥2, (1)
U,V
where
• n: number of users
• m: number of items
(i,j )∈Ω i=1 j =1
• R ∈ Rn×m: input partially observed rating matrix
• Ω ⊆ [n]×[m]: index of observed entries in rating matrix, where [n] denotes the sequence of numbers {1,2,…,n}.
• k: number of latent features
2

• U ∈ Rn×k: the (unknown) matrix of latent feature vectors for n users (the ith row ui ∈ Rk is the latent features for ith user)
• V ∈ Rm×k: the (unknown) matrix of latent feature vectors for m items (the jth row vj ∈ Rk is the latent features for jth movie)
Please do the followings:
1. In solving Equation (1) with iterative Alternating Minimization algorithm (fixing V (t) and taking gradient step for U(t) and vice verse), discuss what happens if U(0) and V (0) are initialized to zero?
2. Discuss why when there is no regularization in basic MF formulated in Equation (1), i.e., α = β = 0, each user must have rated at least k movies, and each movie must have been rated by at least k users. 2
3. Computing the closed form solution in part (2) could be computational burden for large
number of users or movies. A remedy for this would be using iterative optimization
algorithms such as Stochastic Gradient Descent (SGD) where you smaple movies in
updating the latent features for users and sample users in updating the latent features
of movies. Derive the updating rules for u(t) and v(t) at tth iteration of SGD. Show the ij
detailed steps.
Reinforcement Learning
Problem 4. Consider the MDP represented by a graph in Figure 1 with discount factor γ ∈ [0, 1). States are represented by circles. The pair on the arrows shows the action to be taken and the transition probability from a state to another state, respectively (please note that the representation of MDP as a graph here is slightly different from Example 3.3 in the book). Each of the parameters p and q are in the interval [0, 1]. The reward is 10 for state s3, 1 for state s2 and 0 otherwise.
1. List all the possible policies for MDP in Figure 1 starting from the initial state s0 and
ending at state s3. For each policy π compute the value of each state, vπ(s0), vπ(s1),
vπ(s2), and vπ(s3) for γ = 0 and p = q = 1. 2
2. Show the equation representing the optimal value function for each state, i.e. v∗(s0), v∗(s1), v∗(s2), and v∗(s3). Please note the value of states will be a function of parameters p, q, and γ.
3. Isthereavalueforpsuchthatforallγ∈[0,1)andq∈[0,1],π∗(s0)=a2?
4.Usingp=q=0.25andγ=0.9,computeπ∗ andv∗ forallstatesofMDP.Youcan either solve the recursion of part (2) or implement value iteration (if you intend to the latter, consider the approximation error ε = 10−3 as stopping criteria).
Problem 5. Suppose action selection is greedy. Is Q-learning then exactly the same algo- rithm as Sarsa? Will they make exactly the same action selections and weight updates?
2Hint: consider the closed form solution for ui and vj in notes and argue why these conditions are necessary without regularization.
3

Figure 1: A MDP with states S = {s0, s1, s2, s3}, and actions A = {a0, a1, a2}. The reward is 10 for state s3, 1 for state s2 and 0 otherwise.
4