机器学习人工智能代写 EECS 391 Introduction to Artificial Intelligence

EECS 391 Introduction to Artificial Intelligence

Fall 2018, Written Assignment 5 (“W5”)

Due: Tue Nov 27. Submit a single pdf document along with your code for the whole assignment to Canvas before class. You may scan a handwritten page for the written portions, but make sure you submit a single pdf file.

Total Points: 100 + 20 extra credit

Remember: name and case ID, stapled, answers concise, neat, and legible. Submit in class on the due date.

Note: Some of the questions below ask you to make plots and/or write simple programs. This might be more convenient to do in a language with a good plotting library such as Matlab, Mathematica, or python using matplotlib. Submit your code via Canvas, but turn in the homework writeup in class.

Q1. Bernoulli trials and bias beliefs

Recall the binomial distribution describing the likelihood of getting y heads for n flips 􏰆n􏰃

where θ is the probability of heads. a) Using the fact

p(y|θ,n)= y θy(1−θ)n−y

􏰅1
0 1+n

p(y|θ,n)dθ=

1

derive the posterior distribution for θ assuming a uniform prior. (5 P.)

  1. b)  Plot the likelihood for n = 4 and θ = 3/4. Make sure your plot includes y = 0. (5 P.)
  2. c)  Plot the posterior distribution of θ after each of the following coin flips: head, head, tail, head. You should have four plots total. (10 P.)

Q2. After R&N 20.1 Bags O’ Surprise

The data used for Figure 20.1 on page 804 can be viewed as being generated by h5.

  1. a)  For each of the other four hypotheses, write code to generate a data set of length 100 and plot the corresponding graphs for P(hi|d1,…,dN) and P(DN+1 = lime|d1,…,dN). The plots should follow the format of Figure 20.1. Comment on your results. (15 P.)
  2. b)  What is the mathematical expression for how many candies you need to unwrap before you are more 90% sure which type of bag you have? (5 P.)
  3. c)  Make a plot that illustrates the reduction in variabilty of the curves for the posterior probability for each type of bag by averaging each curve obtained from multiple datasets. (15 P.)

Q3. Classification with Gaussian Mixture Models

Suppose you have a random variable x which is drawn from one of two classes C1 and C2. Each class follows a Gaussian distribution with means μ1 and μ2 (assume μ1 < μ2) and variances σ1 and σ2. Assume that the prior probability of C1 is twice that of C2.

  1. a)  What is the expression for the probability of x, i.e. p(x), when the class is unknown? (5 P.)
  2. b)  What is the expression for the probability of total error in this model assuming that the decision bound-

    ary is at x = θ? (10 P.)

  3. c)  Derive an expression for the value of the decision boundary θ that minimizes the probability of misclas- sification. (10 P.)

1

Q4. k-means Clustering

In k-means clustering, μk is the vector mean of the kth cluster. Assume the data vectors have I dimensions, so μk is a (column) vector [μ1,…,μI]Tk , where the symbol T indicates vector transpose.

a) Derive update rule for μk using the objective function

NK
D = 􏰄 􏰄 rn,k∥xn −μk∥2

n=1 k=1

where xn is the nth data vector, rn,k is 1 if xn is in the kth class and 0 otherwise, and ∥x∥2 = xT x = 􏰀i xi xi = 􏰀i x2i . The update rule is derived by computing the gradient for each element of the kth mean and solving for the value where the gradient is zero. Express your answer first in scalar form for μk,i and in vector form for μk . (20 P.)

b) Extra credit. Write a program that implements the k-means clustering algorithm on the iris data set. Plot the results of the learning process by showing the initial, intermediate, and converged cluster cen- ters for k=2 and k=3. (+20P. bonus)

2