CS计算机代考程序代写 decision tree GMM algorithm Page 2

Page 2

IV. Problems

Probabilistic Graphical Models (PGMs) (40 points)

1. (3 points) Consider the following PGM of five random variables A, B, C, D, and E:

The joint distribution 𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) according to a PGM can be decomposed as follows:

∏ 𝑝(𝑋|𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋))
𝑋∈{𝐴,𝐵,𝐶,𝐷,𝐸}

Decompose 𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) for the PGM below in the above-mentioned form. Your answer
is an expression that solely comprises of probabilities over one or more of the random
variables A, B ,C, D, E. For example, 𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) =
𝑝(𝐶)𝑝(𝐴|𝐶)𝑝(𝐵|𝐴, 𝐶)𝑝(𝐷|𝐴, 𝐵, 𝐶)𝑝(𝐸|𝐴).

2. (3 points) Draw the PGM of five random variables A, B, C, D, E from the joint distribution

𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) = 𝑝(𝐴)𝑝(𝐵)𝑝(𝐶)𝑝(𝐷|𝐴, 𝐵)𝑝(𝐸|𝐷, 𝐶).

3. (24 points) Consider the following PGM of three binary random variables (A, B, and C)
and corresponding probability terms. Remember, each binary random variable has two
outcomes.

For parts a – f (3 points each), compute the probabilities. Your answer must be a floating point
decimal. Show your work to receive full credit by writing your derivation / calculation.

a) 𝑝(𝐵 = 𝑏)
b) 𝑝(𝐶 = ¬𝑐)

Page 3

c) 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐)
d) 𝑝(𝐴 = 𝑎|𝐵 = 𝑏)
e) 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐|𝐵 = 𝑏)
f) 𝑝(𝐴 = 𝑎|𝐶 = ¬𝑐)

For parts g and h (3 points each), answer “Yes” or “No” and state your reason for full credit.

g) Is 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐) equal to 𝑝(𝐴 = 𝑎)𝑝(𝐶 = ¬𝑐)?
h) Is 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐|𝐵 = 𝑏) equal to 𝑝(𝐴 = 𝑎|𝐵 = 𝑏) 𝑝(𝐶 = ¬𝑐|𝐵 = 𝑏)?

Hint:

x 𝑝(𝐶 = ¬𝑐) = 𝑝(𝐶 = ¬𝑐, 𝐵 = 𝑏) + 𝑝(𝐶 = ¬𝑐, 𝐵 = ¬𝑏) = 𝑝(𝐶 = ¬𝑐|𝐵 = 𝑏)𝑝(𝐵 =
𝑏) + 𝑝(𝐶 = ¬𝑐|𝐵 = ¬𝑏)𝑝(𝐵 = ¬𝑏)

x 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐) = 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐, 𝐵 = 𝑏) + 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐, 𝐵 = ¬𝑏)

4. (10 points) The following is a PGM of five random variables A, B, C, D, and E.

For each part below (2 points each), answer “Yes” or “No” and state your reason for full credit.

Page 4

K-means (15 points)

5. Consider the following one-dimensional dataset of 8 data instances:

Perform 10 iterations of the K-means algorithm given 𝐾 = 2 and the initializations below. 𝑐1
(𝑡)

and 𝑐2
(𝑡) represent the 𝐾 = 2 centers at the end of the t-th iteration. Use squared Euclidean

distance (as we do in the slides).

Note, if no points are assigned to a center (e.g., if no points are assigned to 𝑐1
(𝑡)), then 𝑐1

(𝑡+1) =
𝑐1

(𝑡).

Show your work to receive full credit.

Let 𝑐1
(𝑡) = 25 and 𝑐2

(𝑡) = 50 at 𝑡 = 0 (i.e., initialization). Compute 𝑐1
(10) and 𝑐2

(10).

Gaussian mixture models (15 points)

6. Consider the following one-dimensional dataset of 6 data instances:

You will follow steps to construct a Gaussian mixture model with two components (i.e., 1 & 2)
to model the distribution of these 6 data points.

Note that to construct a Gaussian mixture model, you are to estimate:

x 𝜇1,𝜎1, such that 𝑝(𝑋 = 𝑥|𝑌 = 1) =
1

𝜎1√2𝜋
𝑒𝑥𝑝 −

(𝑥−𝜇1)2

2𝜎12

x 𝜇2,𝜎2, such that 𝑝(𝑋 = 𝑥|𝑌 = 2) =
1

𝜎2√2𝜋
𝑒𝑥𝑝 −

(𝑥−𝜇2)2

2𝜎22

x 𝜆, such that 𝑝(𝑌 = 1) = 𝜆

so that together you have the GMM distribution

𝑝(𝑋 = 𝑥) = 𝑝(𝑌 = 1)𝑝(𝑋 = 𝑥|𝑌 = 1) + 𝑝(𝑌 = 2)𝑝(𝑋 = 𝑥|𝑌 = 2)

Let us denote by 𝜃 all the parameters {𝜇1, 𝜎1, 𝜇2, 𝜎2, 𝜆}.

Page 5

Like K-means, GMM also needs initialization. Let

x 𝜇1
(0) = 4, 𝜎1

(0) = 4
x 𝜇2

(0) = 10, 𝜎2
(0) = 4

x 𝜆(0) = 0.5

be the initialization and let 𝜃(0) denote {𝜇1(0), 𝜎1(0), 𝜇2(0), 𝜎2(0), 𝜆(0)}. Here, ” (0) ” means the
parameters at the 0-th iteration.

Show your work to receive full credit.

a) E-steps: Compute 𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0)) for each data point in {𝑥𝑖}𝑖=1
6 (a numerical value).

𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0)) =
𝑝(𝑌 = 1, 𝑥𝑖; 𝜃(0))

𝑝(𝑥𝑖; 𝜃(0))

=
𝑝(𝑌 = 1; 𝜃(0))𝑝(𝑥𝑖| 𝑌 = 1; 𝜃(0))

𝑝(𝑌 = 1; 𝜃(0))𝑝(𝑥𝑖| 𝑌 = 1; 𝜃(0)) + 𝑝(𝑌 = 2; 𝜃(0))𝑝(𝑥𝑖| 𝑌 = 2; 𝜃(0))

b) M-steps: Based on the values 𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0)) for 𝑖 ∈ {1, … ,6} that you computed

earlier, compute the updated parameters 𝜃(1) = {𝜇1(1), 𝜎1(1), 𝜇2(1), 𝜎2(1), 𝜆(1)}.

𝜇𝑘(1) =
∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0)) × 𝑥𝑖𝑖

∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0))𝑖

𝜎𝑘(1) = √
∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0)) × (𝑥𝑖 − 𝜇𝑘(1))2𝑖

∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0))𝑖

𝜆(1) =
∑ 𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0))𝑖

where 𝑘 ∈ {1, 2} is the index of each component.

Page 6

Decision Trees (10 points)

7. The following is a training set with three binary attributes A, B, and C:

Example # A B C CONCEPT

1 True False False False
2 True False True False
3 False True False False
4 True True True True
5 True True False True

Determine the decision tree using the majority rule. Show each step of the algorithm and the
final tree for full credit.

K-nearest neighbor (10 points)

8. Use k-nearest neighbor classification to classify on the following training set with two

classes {+, -}:
Class x1 x2

+ 1 2
+ 2 1
– 2 2
+ 2 3
– 3 1
+ 3 2

a) (2 points) Plot these six training points on an x1-x2 axis.
b) (4 points) Separate the classes {+, -} with a decision boundary on your plot in part a for k =

1. Fill in the region(s) that will classify the “–“ classes.
c) (4 points) Separate the classes {+, -} with a decision boundary on your plot in part a for k =

3. Fill in the region(s) that will classify the “–“ classes.

Page 7

Support Vector Machine (10 Points)

9. Use a support vector machine to classify on the following training set:

Class x1 x2
+ 1 1
+ 2 2
+ 2 0
– 0 0
– 1 0
– 0 1

a) (2 points) Plot these six training points on an x1-x2 axis.
b) (8 points) Separate the classes {+, -} with a maximum margin separator defined as the set

of points {x : w ∙ x + b = 0}. Give your choice for vector w and intercept b as your answer.

V. Carmen submission

Submit this portion of Homework #4 (Problem Set) into the drop box for the problem set
portion. You will submit the programming portion into a different drop box.

Related Posts