Question 1 (20%)
The probability density function (pdf) for a 2-dimensional real-valued random vector X is as
follows: p(x) = P(L = 0) p(x|L = 0) + P(L = 1) p(x|L = 1). Here L is the true class label that indicates which class-label-conditioned pdf generates the data.
The class priors are P(L = 0) = 0.6 and P(L = 1) = 0.4. The class class-conditional pdfs are p(x|L = 0) = w01g(x|m01,C01) + w02g(x|m02,C02) and p(x|L = 1) = w11g(x|m11,C11) + w12g(x|m12,C12), where g(x|m,C) is a multivariate Gaussian probability density function with mean vector m and covariance matrix C. The parameters of the class-conditional Gaussian pdfs are: wi1 =wi2 =1/2fori∈{1,2},and
Copyright By PowCoder代写 加微信 powcoder
m01 =[−1] m02 =[1] m11 =[−1] m12 =[ 1 ] Cij =[1 0]forall{ij}pairs. −1 1 1 −1 01
For numerical results requested below, generate the following independent datasets each con- sisting of iid samples from the specified data distribution, and in each dataset make sure to include the true class label for each sample.
• D20 consists of 20 samples and their labels for training; t rain
• D200 consists of 200 samples and their labels for training; t rain
• D2000 consists of 2000 samples and their labels for training; t rain
• D10K consists of 10000 samples and their labels for validation; validate
Part 1: (6%) Determine the theoretically optimal classifier that achieves minimum probability
of error using the knowledge of the true pdf. Specify the classifier mathematically and implement
it; then apply it to all samples in D10K . From the decision results and true labels for this validate
validation set, estimate and plot the ROC curve for a corresponding discriminant score for this
classifier, and on the ROC curve indicate, with a special marker, the location of the min-P(error)
classifier. Also report an estimate of the min-P(error) achievable, based on counts of decision-
truth label pairs on D10K . Optional: As supplementary visualization, generate a plot of the validate
decision boundary of this classification rule overlaid on the validation dataset. This establishes an aspirational performance level on this data for the following approximations.
Part 2: (12%) (a) Using the maximum likelihood parameter estimation technique train three
separate logistic-linear-function-based approximations of class label posterior functions given a
sample. For each approximation use one of the three training datasets D20 , D200 , D2000. When t rain t rain t rain
optimizing the parameters, specify the optimization problem as minimization of the negative-log-
likelihood of the training dataset, and use your favorite numerical optimization approach, such as
gradient descent or Matlab’s fminsearch. Determine how to use these class-label-posterior approx-
imations to classify a sample in order to approximate the minimum-P(error) classification rule;
apply these three approximations of the class label posterior function on samples in D10K , and validate
estimate the probability of error that these three classification rules will attain (using counts of decisions on the validation set). Optional: As supplementary visualization, generate plots of the decision boundaries of these trained classifiers superimposed on their respective training datasets and the validation dataset. (b) Repeat the process described in Part (2a) using a logistic-quadratic- function-based approximation of class label posterior functions given a sample.
Discussion: (2%) How does the performance of your classifiers trained in this part compare to each other considering differences in number of training samples and function form? How do they compare to the theoretically optimal classifier from Part 1? Briefly discuss results and insights.
Note 1: With x representing the input sample vector and w denoting the model parameter vec- tor,logistic-linear-functionreferstoh(x,w)=1/(1+e−wTz(x)),wherez(x)=[1,xT]T;andlogistic- quadratic-function refers to h(x,w) = 1/(1+e−wTz(x)), where z(x) = [1,×1,x2,x12,x1x2,x2]T.
Question 2 (20%)
Assume that scalar-real y and two-dimensional real vector x are related to each other according
to y = c(x, w) + v, where c(., w) is a cubic polynomial in x with coefficients w and v is a random Gaussian random scalar with mean zero and σ2-variance.
Given a dataset D = (x1,y1),…,(xN,yN) with N samples of (x,y) pairs, with the assumption that these samples are independent and identically distributed according to the model, derive two estimators for w using maximum-likelihood (ML) and maximum-a-posteriori (MAP) parameter estimation approaches as a function of these data samples. For the MAP estimator, assume that w has a zero-mean Gaussian prior with covariance matrix γI.
Having derived the estimator expressions, implement them in code and apply to the dataset generated by the attached Matlab script. Using the training dataset, obtain the ML estimator and the MAP estimator for a variety of γ values ranging from 10−m to 10n. Evaluate each trained model by calculating the average-squared error between the y values in the validation samples and model estimates of these using c(.,wtrained). How does your MAP-trained model perform on the validation set as γ is varied? How is the MAP estimate related to the ML estimate? Describe your experiments, visualize and quantify your analyses (e.g. average squared error on validation dataset as a function of hyperparameter γ) with data from these experiments.
Note: Point split will be 20% for ML and 20% for MAP estimator results and discussion.
Question 3 (20%)
A vehicle at true position [xT,yT]T in 2-dimensional space is to be localized using distance (range) measurements to K reference (landmark) coordinates {[x1,y1]T,…,[xi,yi]T,…,[xK,yK]T}. Theserangemeasurementsareri =dTi+ni fori∈{1,…,K},wheredTi =∥[xT,yT]T−[xi,yi]T∥ is the true distance between the vehicle and the ith reference point, and ni is a zero mean Gaus- sian distributed measurement noise with known variance σi2. The noise in each measurement is independent from the others.
Assume that we have the following prior knowledge regarding the position of the vehicle: −1x yσx2 0−1x
x −12 0σy2y
p y =(2πσxσy) e (1)
where [x, y]T indicates a candidate position under consideration.
Express the optimization problem that needs to be solved to determine the MAP estimate of
the vehicle position. Simplify the objective function so that the exponentials and additive/multiplicative terms that do not impact the determination of the MAP estimate [xMAP,yMAP]T are removed appro- priately from the objective function for computational savings when evaluating the objective.
Implement the following as computer code: Set the true vehicle location to be inside the circle with unit radious centered at the origin. For each K ∈ {1, 2, 3, 4} repeat the following.
Place evenly spaced K landmarks on a circle with unit radius centered at the origin. Set mea- surement noise standard deviation to 0.3 for all range measurements. Generate K range measure-
ments according to the model specified above (if a range measurement turns out to be negative, reject it and resample; all range measurements need to be nonnegative).
Plot the equilevel contours of the MAP estimation objective for the range of horizontal and vertical coordinates from −2 to 2; superimpose the true location of the vehicle on these equilevel contours (e.g. use a + mark), as well as the landmark locations (e.g. use a o mark for each one).
Provide plots of the MAP objective function contours for each value of K. When preparing your final contour plots for different K values, make sure to plot contours at the same function value across each of the different contour plots for easy visual comparison of the MAP objective landscapes. Suggestion: For values of σx and σy, you could use values around 0.25 and perhaps make them equal to each other. Note that your choice of these indicates how confident the prior is about the origin as the location.
Supplement your plots with a brief description of how your code works. Comment on the behavior of the MAP estimate of position (visually assessed from the contour plots; roughly center of the innermost contour) relative to the true position. Does the MAP estimate get closer to the true position as K increases? Doe is get more certain? Explain how your contours justify your conclusions.
Note: The additive Gaussian distributed noise used in this question is likely not appropri- ate for a proper distance sensor, since it could lead to negative measurements. However, in this question, we will ignore this issue and proceed with this noise model for illustration. In practice, a multiplicative log-normal distributed noise may be more appropriate than an additive normal distributed noise depending on the measurement mechanism.
Question 4 (20%)
Problem 2.13 from Duda-Hart-Stork textbook:
Question 5 (20%)
Let Z be drawn from a categorical distribution (takes discrete values) with K possible out- comes/states and parameter θ, represented by Cat(Θ). Describe the value/state using a 1-of-K scheme for z = [z1,…,zK]T where zk = 1 if variable is in state k and zk = 0 otherwise. Let the parameter vector for the pdf be Θ = [θ1,…,θK]T , where P(zk = 1) = θk, for k ∈ {1,…,K}.
GivenD{z1,…,zN}withiidsampleszn ∼Cat(Θ)forn∈{1,…,N}:
• What is the ML estimator for Θ?
• Assuming that the prior p(Θ) for the parameters is a Dirichlet distribution with hyperparam- eter α, what is the MAP estimator for Θ?
Hint: The Dirichlet distribution with parameter α is
where the normalization constant is
∏K Γ(α) k=1 k
Γ(∑K αk) k=1
p(Θ|α ) = ∏ θ αk −1
B(α) k k=1
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com