代写 Bayesian Task 1

Task 1
In this task, you will use the dataset linRegData.txt, containing 150 points in the format . The input is generated by a sinusoid function, while the output is the joint trajectory of a compliant robotic arm. The first 20 data points are the training set and the remainder are the testing set.
a) PolynomialFeatures
Write the equation of the model and fit it with polynomial features. Using the Root Mean Square Error (RMSE) as a metric for the evaluation, select the complexity of the model (up to a 21st degree polynomial) by evaluating its performance on the testing data. Which is the best RMSE you achieve and what is the model complexity? Does it change if we evaluate our model on the training data? Comment your findings and plot the RMSE for each case (use two lines, one for evaluation on training data, one for evaluation on testing data). For the estimation of the optimal parameters use a ridge coefficient of λ = 10−6 . Using what you think is the best learned model from the previous point, show in a single plot the ground truth (full dataset) and the model prediction over it. Attach snippets of your code showing how you generate polynomial features and how you fit the model.
b) Gaussian Features 

Now use Gaussian features. Each feature is a Gaussian distribution were the means are distributed linearly in x∈[0, 2] and the variance is set to σ2 = 0.02. The features have to be normalized, i.e., they have to sum to one at every x. Using 10 features generate a plot with the activation of each feature over time (i.e., plot the matrix Φ).Attach a snippet of your code showing how to compute Gaussian features.
b+1) Gaussian Features, Continued
Repeat the process of fitting the model using the Gaussian features from the previous question. Compare the RMSE on the testing data using 15 . . . 40 basis functions and plot the RMSE. Which number of basis functions has the best performance and what is the best RMSE? Use a ridge coefficient of λ = 10 −6 .
c) Bayesian Linear Regression 

Using Bayesian linear regression, plot the mean and the standard deviation of the predictive distribution learned using the first {10, 12, 16, 20, 50, 150} data points (one plot per case; plot it in the interval x ∈ [0, 2]). Discuss how the model uncertainty changes with the amount of data points and the problem of overfitting with Bayesian linear regression. Use the best performing polynomial features that you found in 3.1a, a ridge coefficient of λ = 10-6,and assume Gaussian noise with σ2 = 0.0025.

d) cross validation
So far, we have split our dataset in two sets: training data and testing data. Cross- validation is a more sophisticated approach for model selection. Discuss it and its variants, pointing out their pro and cons.
Task 2.Linear Classification 

In this task, you will use the dataset ldaData.txt, containing 137 feature points x. The first 50 points belong to class C1, the second 43 to class C2, the last 44 to class C3.
a0) Discriminative and Generative Models
Explain the difference between discriminative and generative models and give an example for each case. Which model category is generally easier to learn and why?
a1)Linear Discriminant Analysis 

Use Linear Discriminant Analysis to classify the points in the dataset, i.e., assume Gaussian distributions in each class with equal covariances and use the posterior distributions for assigning classes. Attach two plots with the data points using a different color for each class: one plot with the original dataset, one with the samples classified according to your LDA classifier. Attach a snippet of your code and discuss the results. How many samples are misclassified? (You are allowed to use built-in functions for computing the mean and the covariance.)
Task3.Principal Component Analysis 

In this task, you will use the dataset iris.txt. It contains data from three kind of Iris flowers (‘Setosa’, ‘Versicolour’ and ‘Virginica’) with 4 attributes: sepal length, sepal width, petal length, and petal width. Each row contains a sample while the last attribute is the label (0 means that the sample comes from a ‘Setosa’ plant, 1 from a ‘Versicolour’ and 2 from ‘Virginica’). (You are allowed to use built-in functions for computing the mean, the covariance, eigenvalues and eigenvectors.)
a) Data Normalization
Normalizing the data is a common practice in machine learning. Normalize the provided dataset such that it has zero mean and unit variance per dimension. Why is normalizing important? Attach a snippet of your code.

b)Principal Component Analysis 

Apply PCA on your normalized dataset and generate a plot showing the

proportion (percentage) of the cumulative variance explained. How many components do you need in order to explain at least 95% of the dataset variance?Attach a snippet of your code.
b)Low Dimensional Space 

Using as many components as needed to explain 95% of the dataset variance, generate a scatter plot of the lowerdimensional projection of the data. Use different colors or symbols for data points from different classes. What doyou observe? Attach a snippet of your code
c)

Reconstruct the original dataset by using different number of principal components. Using the normalized root mean square error (NRMSE) as a metric, fill the table below (error per input versus the amount of principal components used).

N. of components x1 x2 X3 x4
d) Kernel PCA
Throughout this class we have seen that PCA is an easy and efficient way to reduce the dimensionality of some data. However, it is able to detect only linear dependences among data points. A more sophisticated extension to PCA, Kernel PCA, is able to overcome this limitation. This question asks you to deepen this topic by conducting some research by yourself: explain what Kernel PCA is, how it works and what are its main limitations. Be as concise (but clear) as possible.