CS计算机代考程序代写 algorithm 1. You are given a data set, S, which records instances of input attributes, x = [x1, x2]T 2 R2, denoted as {x(i) = [x(i), x(i)]T }n , and their labels y 2 {0, 1}, denoted as {y(i)}n . The

1. You are given a data set, S, which records instances of input attributes, x = [x1, x2]T 2 R2, denoted as {x(i) = [x(i), x(i)]T }n , and their labels y 2 {0, 1}, denoted as {y(i)}n . The
1 2 i=1 i=1 input attributes and labels are paired so that S = {(x(i), y(i))}ni=1, where n = 24:
x1 x2 y -0.01 5.19 0 1.32 1.74 1 3.36 4.55 0 0.95 1.9 1 2.4 2.74 0 0.22 2.75 0 -0.32 5.28 0 3.7 6.9 0 1.63 1.67 1 1.21 4.09 0 1.73 4.42 0 1.03 4.91 0 0.9 7.37 0 5.78 5.84 0 1.2 5.52 0 1.46 0.7 1 1.25 2.33 1 1.82 2.27 1 2.18 1.42 1 -1.61 3.95 0 5.53 4.43 0 0.91 4.34 0 1.83 0.81 1 2.79 2.04 0
Table 1: S
For your convenience you have also been provided with a csv file of the data, which can be
found in the briefing area for this problem set, named ‘OpenBookTest2 Q1 Data.csv’].
You are asked to build a binary classifier in order to clasify the y label of a point, given its x attribute. You decide to proceed by constructing a Naive Bayes probabilistic classifier as follows:
First you assume that classes are distributed according to a Bernoulli random variable, Y, with outcomes y ⇠ Bern(✓), with pmf characterised by pY (y = 1) = ✓.
Then you consider the following Exponential model for the class conditional probability distributions for the random variables, X1, X2, the outcomes of which are given by instances of particular input attributes, x1, x2 respectively:
[NB:
COMP0142(L6) 3 TURN OVER

Model 1:
Where: 01,02,11,12 >0. [NB:
x1|(y = 0) ⇠ Exp(01) x2|(y = 0) ⇠ Exp(02) x1|(y = 1) ⇠ Exp(11) x2|(y = 1) ⇠ Exp(12)
If a random variable Z, with outcomes z 2 [0, 1), is Exponential distributed, z ⇠ Exp(), then the associated pdf is given by:
pZ(z;) = (ez if: x 0 0 if: x < 0 For some > 0.]
As usual you then classify a particular point with input attribute vector, x, as belonging to
class 1 if pY (y = 1|x) 0.5, and as belonging to class 0 otherwise.
(a) i) Fit ✓ and the parameters of Model 1 using S and maximum likelihood estimation.
Express your final results to 2 decimal places.
[5 marks]
ii) Fully describe the resulting discriminant boundary, giving its functional form and providing a sketch (hand-drawn sketches are fine so long as they are inserted into the pdf of your solutions at the appropriate point).
[4 marks]
Upon reflection you realise that Model 1 is not appropriate for this data set. A more appropriate model for the class conditional probability distributions is given by:
Model 2:
x1|(y = 0) ⇠ N (μ01, 021) x2|(y = 0) ⇠ N (μ02, 022) x1|(y = 1) ⇠ N (μ11, 121) x2|(y = 1) ⇠ N (μ12, 122)
Where: μ01, μ02, μ11, μ12 2 R and 01, 02, 11, 12 2 (0, 1).
COMP0142(L6) 4 TURN OVER

(b) Briefly explain why Model 2 rather than Model 1 is a more appropriate choice to model this data set.
[4 marks]
(c) i) Fit the parameters of Model 2 using S and maximum likelihood estimation. Express
your final results to 2 decimal places.
[5 marks]
ii) Fully describe the resulting discriminant boundary, giving its functional form and providing a sketch (hand-drawn sketches are fine so long as they are inserted into the pdf of your solutions at the appropriate point).
[6 marks]
(d) Briefly explain how the discriminant boundary would change if Model 2 was changed such that: 01 = 11 and 02 = 12.
[1 mark]
[Total for Question 1: 25 marks]
COMP0142(L6) 5 TURN OVER

2. The n pupils of a school are issued with a survey regarding their school meals. Each survey contains two questions which take binary responses: The first asks pupils whether they like cabbage or not, while the second asks pupils whether they like walnuts or not. Not all pupils answer the questions. The results in terms of number of respondents are given below:
Answers to 1st Question:
Like Walnuts Like Cabbage n11 Dislike Cabbage n01
Not Answered nh1
Answers to 2nd Question:
Dislike Walnuts
n10 n00 nh0
Not Answered
n1h n0h nhh
Table 2: Number of survey responses
Let us characterise a random variable X1 with outcomes associated with cabbage preference given by x1 2 {0, 1}, (denoting dislike and like respectively) and a random variable X2 with outcomes associated with walnut preference given by x2 2 {0, 1}, (denoting dislike and like respectively).
We wish to learn the joint distribution, pX (x1, x2) (where X = [X1, X2]T ) characterised by:
pX (x1 = 1, x2 = 1) = ↵11 pX (x1 = 1, x2 = 0) = ↵10 pX (x1 = 0, x2 = 1) = ↵01 pX (x1 = 0, x2 = 0) = ↵00
Here↵11,↵10,↵01,↵00 2[0,1],and(↵11+↵10+↵01+↵00)=1.
(a) If we ignore all surveys which contain missing data then derive the maximum likelihood estimatiors of ↵11, ↵10, ↵01, ↵00 in terms of the contents of Table 2.
[NB:
You should handle parameter constraints using the following re-parameterisation trick: ↵ =P e⌘kl ,where⌘ 2R.]
[6 marks]
kl 1,1 e⌘k ̃ ̃l kl k ̃=0, ̃l=0
COMP0142(L6) 6 TURN OVER

Now, let us try to learn the parameterisation of the joint distribution using the infor- mation gleaned from the missing data.
With this in mind we may write the log-likelihood function as:
L=log = log
Yn ̃ (i) (i)
pX(x1 ,x2 ) ↵n00 ↵n10 ↵n01 ↵n11
“i=1
00 10 01 11
0X ⇥@
0x22{0,1} ⇥@ X
1n0h 0X pX(x1 = 0,×2)A ⇥@
1nh0 0x22{0,1} pX(x1,x2 = 0)A ⇥@ X
1n1h pX(x1 = 1,×2)A
1nh1 # pX(x1,x2 = 1)A
x1 2{0,1}
In forming this log-likelihood function we have ‘summed over’ the missing, or hidden,
x1 2{0,1} W h e r e : n ̃ = n n h h .
data, which we treat as the outcome of hidden variables.
Maximising this function looks like a job for the EM algorithm. Using similar reasoning to that given in the lecture slides when we encountered this algorithm (albeit in a di↵erent context), we can obtain the following lower bound on the function:
Ln log↵ +nXlog↵ +n log↵ +n log↵ 00 00 10 10 01 01 11 11
+n0h q0h(x2)log✓pX(x1 =0,×2)◆
x2 2X{0,1} q0h (x2 )
+n1h
x2 2X{0,1}
+nh0
x1 2X{0,1}
+nh1
x1 2{0,1}
q1h(x2)log✓pX(x1 =1,×2)◆ q1h (x2 )
qh0(x1)log✓pX(x1,x2 =0)◆ qh0 (x1 )
qh1(x1)log✓pX(x1,x2 =1)◆ qh1 (x1 )
Here q0h(·), q1h(·) are pdf’s over x2, and qh0(·), qh1(·) are pdf’s over x1.
The EM algorithm seeks to learn the parameters: ↵00, ↵10, ↵01, ↵11, q0h(x2), q1h(x2), qh0(x1), qh1(x1) (for x2 2 {0,1} and x1 2 {0,1}) iteratively. After the t-th iteration of the algorithm the estimates of these parameters are denoted by: ↵0t 0, ↵1t 0, ↵0t 1, ↵1t 1, q0th(x2), q1th(x2), qht 0(x1), qht 1(x1).
COMP0142(L6) 7 TURN OVER

(b) State the t-th E-step of the EM algorithm based on the lower bound given above. In other words, state expressions for: q0th(x2), q1th(x2), qht 0(x1), qht 1(x1) in terms of the parameter estimates for ↵11, ↵10, ↵01, ↵00 from the previous iteration of the algorithm, i.e. ↵t1, ↵t1, ↵t1, ↵t1.
[6 marks]
(c) Derive the t-th M-step of the EM algorithm. In other words, derive expressions for ↵1t1,↵1t0,↵0t1,↵0t0 in terms of q0th(x2),q1th(x2),qht0(x1),qht1(x1).
[NB:
Again, you should handle parameter constraints using the following re-parameterisation trick:↵ =P e⌘kl ,where⌘ 2R.]
11 10 01 00
kl 1,1 e⌘k ̃ ̃l kl k ̃=0, ̃l=0
(d) You are presented with particular survey data from a school:
Given this data and assuming an initialisation such that ↵101, ↵100, ↵01, ↵00 are set equal to the estimates derived in part (a) of this question, then calculate parameter es- timates at convergence (to 2 decimal places) of the EM algorithm, i.e. calculate ↵1t 1, ↵1t 0, ↵0t 1, ↵0t 0 to two decimal places as t becomes sucently large. In addition, for this level of accuracy, state what value of t constitutes ‘large’ in this case.
[5 marks]
[Total for Question 2: 25 marks]
Answers to 2nd Question:
[8 marks]
Answers to Like Cabbage 1st Question: Dislike Cabbage
Not Answered
50 72 18 35 106 43 90 25 10
COMP0142(L6) 8 END OF PAPER