CS计算机代考程序代写 decision tree GMM algorithm Page 2

Page 2

IV. Problems

Probabilistic Graphical Models (PGMs) (40 points)

1. (3 points) Consider the following PGM of five random variables A, B, C, D, and E:

The joint distribution 𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) according to a PGM can be decomposed as follows:

∏ 𝑝(𝑋|𝑝𝑎𝑟𝑒𝑛𝑡𝑠(𝑋))
𝑋∈{𝐴,𝐵,𝐶,𝐷,𝐸}

Decompose 𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) for the PGM below in the above-mentioned form. Your answer
is an expression that solely comprises of probabilities over one or more of the random
variables A, B ,C, D, E. For example, 𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) =
𝑝(𝐶)𝑝(𝐴|𝐶)𝑝(𝐵|𝐴, 𝐶)𝑝(𝐷|𝐴, 𝐵, 𝐶)𝑝(𝐸|𝐴).

2. (3 points) Draw the PGM of five random variables A, B, C, D, E from the joint distribution

𝑝(𝐴, 𝐵, 𝐶, 𝐷, 𝐸) = 𝑝(𝐴)𝑝(𝐵)𝑝(𝐶)𝑝(𝐷|𝐴, 𝐵)𝑝(𝐸|𝐷, 𝐶).

3. (24 points) Consider the following PGM of three binary random variables (A, B, and C)
and corresponding probability terms. Remember, each binary random variable has two
outcomes.

For parts a – f (3 points each), compute the probabilities. Your answer must be a floating point
decimal. Show your work to receive full credit by writing your derivation / calculation.

a) 𝑝(𝐵 = 𝑏)
b) 𝑝(𝐶 = ¬𝑐)

Page 3

c) 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐)
d) 𝑝(𝐴 = 𝑎|𝐵 = 𝑏)
e) 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐|𝐵 = 𝑏)
f) 𝑝(𝐴 = 𝑎|𝐶 = ¬𝑐)

For parts g and h (3 points each), answer “Yes” or “No” and state your reason for full credit.

g) Is 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐) equal to 𝑝(𝐴 = 𝑎)𝑝(𝐶 = ¬𝑐)?
h) Is 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐|𝐵 = 𝑏) equal to 𝑝(𝐴 = 𝑎|𝐵 = 𝑏) 𝑝(𝐶 = ¬𝑐|𝐵 = 𝑏)?

Hint:

x 𝑝(𝐶 = ¬𝑐) = 𝑝(𝐶 = ¬𝑐, 𝐵 = 𝑏) + 𝑝(𝐶 = ¬𝑐, 𝐵 = ¬𝑏) = 𝑝(𝐶 = ¬𝑐|𝐵 = 𝑏)𝑝(𝐵 =
𝑏) + 𝑝(𝐶 = ¬𝑐|𝐵 = ¬𝑏)𝑝(𝐵 = ¬𝑏)

x 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐) = 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐, 𝐵 = 𝑏) + 𝑝(𝐴 = 𝑎, 𝐶 = ¬𝑐, 𝐵 = ¬𝑏)

4. (10 points) The following is a PGM of five random variables A, B, C, D, and E.

For each part below (2 points each), answer “Yes” or “No” and state your reason for full credit.

a) Is 𝐴 ⊥ 𝐸 (i.e., 𝑝(𝐴, 𝐸) = 𝑝(𝐴)𝑝(𝐸))?
b) Is 𝐴 ⊥ 𝐷 | 𝐵 (i.e., 𝑝(𝐷 | 𝐴, 𝐵) = 𝑝(𝐷 | 𝐵))?
c) Is 𝐴 ⊥ 𝐸 | 𝐵 (i.e., 𝑝(𝐴, 𝐸 | 𝐵) = 𝑝(𝐴 | 𝐵)𝑝(𝐸 | 𝐵))?
d) Is 𝐴 ⊥ 𝐸 | 𝐵, 𝐶 (i.e., 𝑝(𝐴, 𝐸 | 𝐵, 𝐶) = 𝑝(𝐴 | 𝐵, 𝐶)𝑝(𝐸 | 𝐵, 𝐶))?
e) Is 𝐴 ⊥ 𝐶 | 𝐷 (i.e., 𝑝(𝐴, 𝐶 | 𝐷) = 𝑝(𝐴 | 𝐷)𝑝(𝐶 | 𝐷))?

Page 4

K-means (15 points)

5. Consider the following one-dimensional dataset of 8 data instances:

Perform 10 iterations of the K-means algorithm given 𝐾 = 2 and the initializations below. 𝑐1
(𝑡)

and 𝑐2
(𝑡) represent the 𝐾 = 2 centers at the end of the t-th iteration. Use squared Euclidean

distance (as we do in the slides).

Note, if no points are assigned to a center (e.g., if no points are assigned to 𝑐1
(𝑡)), then 𝑐1

(𝑡+1) =
𝑐1

(𝑡).

Show your work to receive full credit.

Let 𝑐1
(𝑡) = 25 and 𝑐2

(𝑡) = 50 at 𝑡 = 0 (i.e., initialization). Compute 𝑐1
(10) and 𝑐2

(10).

Gaussian mixture models (15 points)

6. Consider the following one-dimensional dataset of 6 data instances:

You will follow steps to construct a Gaussian mixture model with two components (i.e., 1 & 2)
to model the distribution of these 6 data points.

Note that to construct a Gaussian mixture model, you are to estimate:

x 𝜇1,𝜎1, such that 𝑝(𝑋 = 𝑥|𝑌 = 1) =
1

𝜎1√2𝜋
𝑒𝑥𝑝 −

(𝑥−𝜇1)2

2𝜎12

x 𝜇2,𝜎2, such that 𝑝(𝑋 = 𝑥|𝑌 = 2) =
1

𝜎2√2𝜋
𝑒𝑥𝑝 −

(𝑥−𝜇2)2

2𝜎22

x 𝜆, such that 𝑝(𝑌 = 1) = 𝜆

so that together you have the GMM distribution

𝑝(𝑋 = 𝑥) = 𝑝(𝑌 = 1)𝑝(𝑋 = 𝑥|𝑌 = 1) + 𝑝(𝑌 = 2)𝑝(𝑋 = 𝑥|𝑌 = 2)

Let us denote by 𝜃 all the parameters {𝜇1, 𝜎1, 𝜇2, 𝜎2, 𝜆}.

Page 5

Like K-means, GMM also needs initialization. Let

x 𝜇1
(0) = 4, 𝜎1

(0) = 4
x 𝜇2

(0) = 10, 𝜎2
(0) = 4

x 𝜆(0) = 0.5

be the initialization and let 𝜃(0) denote {𝜇1(0), 𝜎1(0), 𝜇2(0), 𝜎2(0), 𝜆(0)}. Here, ” (0) ” means the
parameters at the 0-th iteration.

Show your work to receive full credit.

a) E-steps: Compute 𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0)) for each data point in {𝑥𝑖}𝑖=1
6 (a numerical value).

𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0)) =
𝑝(𝑌 = 1, 𝑥𝑖; 𝜃(0))

𝑝(𝑥𝑖; 𝜃(0))

=
𝑝(𝑌 = 1; 𝜃(0))𝑝(𝑥𝑖| 𝑌 = 1; 𝜃(0))

𝑝(𝑌 = 1; 𝜃(0))𝑝(𝑥𝑖| 𝑌 = 1; 𝜃(0)) + 𝑝(𝑌 = 2; 𝜃(0))𝑝(𝑥𝑖| 𝑌 = 2; 𝜃(0))

b) M-steps: Based on the values 𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0)) for 𝑖 ∈ {1, … ,6} that you computed

earlier, compute the updated parameters 𝜃(1) = {𝜇1(1), 𝜎1(1), 𝜇2(1), 𝜎2(1), 𝜆(1)}.

𝜇𝑘(1) =
∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0)) × 𝑥𝑖𝑖

∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0))𝑖

𝜎𝑘(1) = √
∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0)) × (𝑥𝑖 − 𝜇𝑘(1))2𝑖

∑ 𝑝(𝑌 = 𝑘| 𝑥𝑖; 𝜃(0))𝑖

𝜆(1) =
∑ 𝑝(𝑌 = 1| 𝑥𝑖; 𝜃(0))𝑖

6

where 𝑘 ∈ {1, 2} is the index of each component.

Page 6

Decision Trees (10 points)

7. The following is a training set with three binary attributes A, B, and C:

Example # A B C CONCEPT

1 True False False False
2 True False True False
3 False True False False
4 True True True True
5 True True False True

Determine the decision tree using the majority rule. Show each step of the algorithm and the
final tree for full credit.

K-nearest neighbor (10 points)

8. Use k-nearest neighbor classification to classify on the following training set with two

classes {+, -}:
Class x1 x2

+ 1 2
+ 2 1
– 2 2
+ 2 3
– 3 1
+ 3 2

a) (2 points) Plot these six training points on an x1-x2 axis.
b) (4 points) Separate the classes {+, -} with a decision boundary on your plot in part a for k =

1. Fill in the region(s) that will classify the “–“ classes.
c) (4 points) Separate the classes {+, -} with a decision boundary on your plot in part a for k =

3. Fill in the region(s) that will classify the “–“ classes.

Page 7

Support Vector Machine (10 Points)

9. Use a support vector machine to classify on the following training set:

Class x1 x2
+ 1 1
+ 2 2
+ 2 0
– 0 0
– 1 0
– 0 1

a) (2 points) Plot these six training points on an x1-x2 axis.
b) (8 points) Separate the classes {+, -} with a maximum margin separator defined as the set

of points {x : w ∙ x + b = 0}. Give your choice for vector w and intercept b as your answer.

V. Carmen submission

Submit this portion of Homework #4 (Problem Set) into the drop box for the problem set
portion. You will submit the programming portion into a different drop box.