COMP9418_Exam_T3_2021-Solution-checkpoint
T3-2021 Exam¶
COMP9418 – Advanced Topics in Statistical Machine Learning
Copyright By PowCoder代写 加微信 powcoder
University of Wales
7th December, 2021
Before proceeding, please read and acknowledge the following (double-click on this cell and put an X between the brackets [X]):
[ ] I acknowledge that I will complete all of the work I submit for this exam without assistance from anyone else.
Instructions¶
This exam will last for 12 hours, starting 7/12/2021 at 09:00:00 AEDT and ending 7/12/2021 at 21:00:00 AEDT.
Questions will be answered from 9 am to 9 pm AEDT. Questions should be posted in the [WebCMS forum]
You must provide all answers in this Jupyter notebook.
You must use the cells provided to answer the questions. Use markdown cells for textual answers and code cells for programming answers.
Submit this exam by give (command line or WebCMS) before the deadline. If WebCMS submission is slow, or if you are submitting in the last hour, please submit using the give command on the CSE servers (via VLAB or ssh).
The appropriate command is give cs9418 exam *.ipynb. We will not accept late submissions.
The exam has three parts: Multiple choice questions (20%); Questions that require a textual answer (50%); and, programming questions in Python (30%).
This exam is an open book exam. You are permitted to access papers and books as well as the course materials, including slides and solved tutorials. Please, in case of doubt, read the UNSW guidance on open book exams.
You are not permitted to communicate (email, phone, message, talk, etc.) with anyone during the exam, except COMP9418 staff via email or forum.
Do not communicate your exam answers after you finish your exam. Some students may have extended time to complete the exam.
Do not place your exam work in any location accessible to any other person, such as Dropbox and Github.
Ensure that no other person in your household can access your work.
Do not disclose your zpass to any other person. If you have revealed your zpass, you should change it immediately.
We will refer deliberate violations of exam conditions to Student Integrity as serious misconduct.
This exam has nine questions. The total number of marks is 100.
Type your student number and name on the next cell.
Student identification¶
Student ID:
from DiscreteFactors import Factor
from Graph import Graph
from BayesNet import BayesNet
from GaussianFactor import GaussianFactor
Part 1 [20 marks]¶
Part 1 is composed of four multiple-choice questions of five marks each. To answer each question, double-click the cell with the alternatives and write an X between the [ ] of the chosen option.
This is an example before inserting X
[ ] Alternative one
[ ] Alternative two
This is an example after inserting X
[X] Alternative one
[ ] Alternative two
For all four questions, choose only one among the alternatives.
Question 1 [5 marks]¶
Consider the directed Graph $G_D$ (left) and the undirected graph $G_U$ (right). Also, consider the probabilities distributions $P_D$ and $P_U$ that factorise according to $G_D$ and $G_U$, respectively. Select the correct alternative regarding graph separation and probability independence.
[ ] $dsep_{G_D}(B,F,D)$ and $sep_{G_U}(B,F,D)$. Therefore, $B \perp D | F$ in both $P_D$ and $P_U$.
[ ] $dsep_{G_D}(C,\emptyset,D)$ and $sep_{G_U}(C,\emptyset,D)$. Therefore, $C \perp D$ in both $P_D$ and $P_U$.
[ ] $\neg dsep_{G_D}(B,F,D)$ and $\neg sep_{G_U}(B,F,D)$. Therefore, $B \not\perp D | F$ in $P_D$ and $B \not\perp D | F$ in $P_U$.
[ ] $\neg dsep_{G_D}(B,F,D)$ and $\neg sep_{G_U}(C,\emptyset,D)$. Therefore, $B \not\perp D | F$ in $P_D$ and $C \not\perp D$ in $P_U$.
[X] None of the above alternatives.
Question 2 [5 marks]¶
Let’s suppose we apply the Expectation-Maximization (EM) approach to learning parameters from a complete (with no missing values) dataset. Choose the correct alternative:
[X] EM will converge in a single iteration, and it will return the maximum-likelihood estimates independent of the initial parameter values.
[ ] EM will converge in one or more iterations, and it will return the maximum-likelihood estimates independent of the initial parameter values.
[ ] EM will often converge to the maximum-likelihood estimates, but it will depend on the quality of the initial parameter values.
[ ] EM will not converge to the maximum-likelihood estimates.
[ ] EM is not applicable to complete data.
Question 3 [5 marks]¶
Suppose we used the Variable Elimination algorithm on a Gaussian Bayesian Network with elimination order $o$. The network has $N$ variables and the width of the network with elimination order $o$ is $w$.
What is the time complexity of this algorithm? Choose the option with the tightest upper bound.
[ ] $N e^w$
[ ] $N w^3$
[X] $N w^2$
[ ] $N \log w$
Question 4 [5 marks]¶
Consider the following three directed graphs, $G_1$, $G_2$ and $G_3$.
On a dataset $D$, $G_1$ has a log-likelihood of -32.4, $G_2$ has log-likelihood of -28.3, and $G_3$ has log-likelihood of -15.2. $D$ contains 64 data points and six binary variables. Rank the three graphs using the Minimum Description Length (MDL) score, from best to worst on the dataset $D$.
[ ] $G_1, G_2, G_3$
[X] $G_1, G_3, G_2$
[ ] $G_2, G_3, G_1$
[ ] $G_3, G_2, G_1$
[ ] $G_3, G_1, G_2$
Part 2 [50 marks]¶
Part 2 is composed of three open questions. To answer each question, edit the markdown cell after the question statement and insert your answer.
Question 5 [20 marks]¶
Andy and Lance are two metal detectorists that like to work together. Their hobby consists of using metal detectors to find buried metallic objects (targets). Andy uses an XP metal detector, while Lance operates a Minelab. When searching for a buried target, they both measure their machines’ response before excavating the target. A target can be trash (bottle caps, cans, nails, etc.) or treasure (coins, rings, historical artefacts, etc.).
The XP correctly recognises trash with 90% accuracy but has a lower accuracy in identifying treasure correctly, 70%. The Minelab machine identifies trash and treasure correctly with 80% and 70% accuracy, respectively.
Both detectors are sensitive machines that may require calibration from time to time. The XP is a more robust machine, and the probability of being uncalibrated is only 1%. The Minelab has a higher chance of being uncalibrated, 5%. An uncalibrated device has its accuracy for detecting treasure reduced by 10% while leaving the trash accuracy unmodified.
Given that 99% of the detected objects are trash, what is the probability of a target being a treasure given that both machines read treasure?
[5 Marks] Show a Bayesian network structure (graph) for this problem.
[5 Marks] Briefly explain your network.
[5 Marks] Show the outcome space and network conditional probability tables (CPTs) for all variables. If any information is missing, assume a uniform distribution.
[5 Marks] What is the answer to this question? Solve it as a programming exercise, i.e., provide a program that computes the solution. Is the numerical solution what you expect? Briefly comment on the resulting probabilities.
# Your answer for 1 – Bayesian network structure
# Define your graph here
G = Graph({
‘T’: (‘X’, ‘M’),
‘CX’: (‘X’),
‘CM’: (‘M’),
# To improve visualisation
‘CX’: ‘1,1!’,
‘T’: ‘3,1!’,
‘CM’: ‘5,1!’,
‘X’: ‘2,0!’,
‘M’: ‘4,0!’,
G.show(positions=pos)
Your answer for 2
Variables:
$T$: Target, it can assume values ‘treasure’ or ‘trash’
$CX$: Calibration XP, it can assume values ‘calibrated’ or ‘uncalibrated’
$CM$: Calibration Minelab, it can assume values ‘calibrated’ or ‘uncalibrated’
$X$: XP, it can assime values ‘treasure’ or ‘trash’
$M$: Minelab, it can assime values ‘treasure’ or ‘trash’
Both XP and Minelab readings can be influenced by the target under them and their calibration state.
# Your answer for 3
outcomeSpace = dict(
T=(‘treasure’, ‘trash’),
CX=(‘calibrated’, ‘uncalibrated’),
CM=(‘calibrated’, ‘uncalibrated’),
X=(‘treasure’, ‘trash’),
M=(‘treasure’, ‘trash’),
BN = BayesNet(G, outcomeSpace)
T_prob = Factor((‘T’,), outcomeSpace)
T_prob[‘treasure’] = 0.01
T_prob[‘trash’] = 0.99
BN.factors[‘T’] = T_prob
CX_prob = Factor((‘CX’,), outcomeSpace)
CX_prob[‘calibrated’] = 0.01
CX_prob[‘uncalibrated’] = 0.99
BN.factors[‘CX’] = CX_prob
CM_prob = Factor((‘CM’,), outcomeSpace)
CM_prob[‘calibrated’] = 0.05
CM_prob[‘uncalibrated’] = 0.95
BN.factors[‘CM’] = CM_prob
X_prob = Factor((‘CX’, ‘T’, ‘X’), outcomeSpace)
X_prob[‘calibrated’, ‘treasure’, ‘treasure’] = 0.7
X_prob[‘calibrated’, ‘treasure’, ‘trash’] = 0.3
X_prob[‘calibrated’, ‘trash’, ‘treasure’] = 0.1
X_prob[‘calibrated’, ‘trash’, ‘trash’] = 0.9
X_prob[‘uncalibrated’, ‘treasure’, ‘treasure’] = 0.63
X_prob[‘uncalibrated’, ‘treasure’, ‘trash’] = 0.37
X_prob[‘uncalibrated’, ‘trash’, ‘treasure’] = 0.1
X_prob[‘uncalibrated’, ‘trash’, ‘trash’] = 0.9
# Students may also use these probabilities in the case
# they understand that the accuracy should reduce in
# 10 percentual points
# X_prob[‘uncalibrated’, ‘treasure’, ‘treasure’] = 0.6
# X_prob[‘uncalibrated’, ‘treasure’, ‘trash’] = 0.3
# X_prob[‘uncalibrated’, ‘trash’, ‘treasure’] = 0.1
# X_prob[‘uncalibrated’, ‘trash’, ‘trash’] = 0.9
BN.factors[‘X’] = X_prob
M_prob = Factor((‘CM’, ‘T’, ‘M’), outcomeSpace)
M_prob[‘calibrated’, ‘treasure’, ‘treasure’] = 0.8
M_prob[‘calibrated’, ‘treasure’, ‘trash’] = 0.2
M_prob[‘calibrated’, ‘trash’, ‘treasure’] = 0.3
M_prob[‘calibrated’, ‘trash’, ‘trash’] = 0.7
M_prob[‘uncalibrated’, ‘treasure’, ‘treasure’] = 0.72
M_prob[‘uncalibrated’, ‘treasure’, ‘trash’] = 0.28
M_prob[‘uncalibrated’, ‘trash’, ‘treasure’] = 0.1
M_prob[‘uncalibrated’, ‘trash’, ‘trash’] = 0.9
# Students may also use these probabilities in the case
# they understand that the accuracy should reduce in
# 10 percentual points
# M_prob[‘uncalibrated’, ‘treasure’, ‘treasure’] = 0.7
# M_prob[‘uncalibrated’, ‘treasure’, ‘trash’] = 0.3
# M_prob[‘uncalibrated’, ‘trash’, ‘treasure’] = 0.1
# M_prob[‘uncalibrated’, ‘trash’, ‘trash’] = 0.9
BN.factors[‘M’] = M_prob
# Your answer for 4
Q = BN.query((‘T’,), X=’treasure’, M=’treasure’)
╒══════════╤══════════╕
│ T │ Pr │
╞══════════╪══════════╡
│ treasure │ 0.295431 │
├──────────┼──────────┤
│ trash │ 0.704569 │
╘══════════╧══════════╛
Briefly comment on the results of 4
$P(T=treasure|X=treasure, M=treasure)$ is a bit low apart both machines indicating it is a tresure.
This happens because the prior probability of a target being treasure is very low.
Question 6 [15 marks]¶
Lecture 10 discussed MAP queries. We developed the MPE VE and MAP VE algorithms based on the Variable Elimination (VE) algorithm. In lecture 11, we learned about jointrees. Suppose we want to replace the VE algorithm with the jointree algorithm for computing MAP and MPE queries. Answer the following questions:
How can we redefine the jointree messages to answer maximum a posteriori queries?
Will this new algorithm work for both MAP and MPE queries? Briefly explain.
What are the benefits and limitations of this new approach compared with variable elimination?
Your answer for 1
There are two approaches for this problem.
In the first one, we can modify the messages to use maxization instead of summation. Therefore, the clusters will store max marginals instead of beliefs. A message defined in the lectures as
$M_{ij} =\sum_{C_{i}\backslash S_{ij}} \Phi_{i}\prod_{k\neq j}M_{ki}$
is redefined as
$M_{ij} =\max_{C_{i}\backslash S_{ij}} \Phi_{i}\prod_{k\neq j}M_{ki}$
In the second approach, we can compute MAP queries with the original jointree messages and maximize out variables in the clusters. In this case, the approach is limited to queries in which the MAP variables are in a same cluster.
Your answer for 2
The first approach only works for MPE queries. MAP queries involve summing out non-MAP variables before maximize out MAP variables. Since we pass messages between all clusters, we do not have control of the order of the max and sum operations.
The second approach works for MAP queries, but it is limited to the case MAP variables are in a same cluster.
Your answer for 3
Both approaches allow us to compute answers to multiple queries. However, the first approach is limited to MPE queries while the second to MAP queries where the MAP variables are in a single cluster.
Question 7 [15 marks]¶
In lecture 15, we performed inference by creating and counting samples. We generated those samples by simulating a Bayesian Network $N$, and we learned that this simulation could be difficult in the presence of evidence. Now, suppose we have a jointree with cluster marginals $P(C_i|\textbf{e})$ and separator marginals $P(S_{ij}|\textbf{e})$ obtained with the execution of the jointree algorithm on the network $N$ and evidence $\textbf{e}$.
Provide an efficient algorithm for simulating $N$ conditioned on $\textbf{e}$ given the cluster and separator marginals.
Argue that your algorithm generates a sequence of independent network instantiations $x_1, … , x_n$ where the probability of generating an instance $x_i$ is $P(x_i|\textbf{e})$.
Discuss the complexity of your algorithm.
Your answer for 1
Input: Jointree $J$ with clusters $\textbf{C}_i$ and separators $\textbf{S}_{ij}$
$~~~~~~~~~$ Evidence $\textbf{e}$
Output: Instantiation $\Sigma$
$\Sigma \leftarrow \textbf{e}$
Start a depth-first search from a random cluster $\textbf{C}_i$
While not all non-evidence variables are sampled
$~~~~~$$\sigma \leftarrow$ random instantiation of variables in $\textbf{C}_i\backslash\textbf{e}$ according $P(\textbf{C}_i|\textbf{e})$
$~~~~~$$\Sigma \leftarrow \Sigma \cup \sigma$
$~~~~~$Find in DFS order a neighbouring cluster to $\textbf{C}_i$. Call this new cluster $\textbf{C}_j$
$~~~~~$Set evidence in $\textbf{C}_j$ according to $\textbf{S}_{ij} = \sigma$
$~~~~~$$\textbf{e} \leftarrow \textbf{e} \cup S_{ij} = \sigma$
$~~~~~$Rename $\textbf{C}_j$ as $\textbf{C}_i$
return $\Sigma$
We need to move in DFS order or any order that let us visit the clusters according to the jointree structure. If we pick the clusters in random order, setting evidence is insufficient to provide the correct results. We may have a long chain of influence such as $A \rightarrow B \rightarrow C \rightarrow D …$.
We can generate an instantiation for all variables in $\textbf{C}_i\backslash\textbf{e}$ in a single step. The procedure generates a number between $[0..1]$ and checks which instantiation the random number falls in. Although generating the random number is $O(1)$, finding out the instantiation is $O(\exp(|\textbf{C}_i|))$ with a naive linear search.
There is no need to follow topological order, as it happens for Bayesian networks. Topological order was used for Bayes nets because nodes store conditional probabilities of variables given parents.
The clusters store joint marginal probabilities. We can set evidence by zeroing rows that do not conform with evidence and renormalising.
Your answer for 2
This algorithm uses the marginals provided by the clusters of the jointree. All marginals are “calibrated”, i.e., computing $P(X)$ results in the same probabilities for any marginal $M$ that contains $X$.
As we know, the jointree has the correct marginals for the associated Bayesian network. Also, the marginals have already incorporated evidence. Therefore, there is no need for rejection sampling, and we do not sample the evidence.
The first cluster is sampled according to $P(\textbf{C}_i|\textbf{e})$. The remaining ones are sampled according to $P(\textbf{C}_j|\textbf{C}_i = \sigma, \textbf{e})$. We carefully add more evidence as we sample the jointree clusters, respecting the jointree structure. This step is important so we respect long chaing of influence between variables.
Your answer for 3
This analysis assumes that the simulation procedure generates only one sample. The jointree $J$ is already computed according to the evidence.
Given that we have $n$ clusters in $J$, the DFS procedure visits each cluster once. During each visit, the algorithm executes steps 4 to 9. The most expensive step is 4 that requires searching the table that stores $P(\textbf{C}_i|\textbf{e})$. This table has size $O(\exp(|\textbf{C}_i|))$.
Therefore, the complexity is $O(n \exp(w))$, being $w$ the number of variables in the largest cluster. We can create an indexing structure for searching the instantiation. However, this index does not improve the complexity for a single sample, although it has a better-amortised complexity when generating more samples.
Part 3 [30 marks]¶
Part 3 is composed of two programming questions of 15 marks each. Use the code cell after each question statement to enter your answer. You can use the code of the tutorials in your answers.
Question 8 [15 marks]¶
In Lecture 11, we learned about jointrees. These probabilistic graphical models must obey a property known as running intersection to be considered proper jointrees. Implement a function RIP(J) that returns true if the jointree J obeys the running intersection property and false otherwise. The argument J is a jointree object as defined in the tutorials.
# Write our answer for Question 8 here
def sep(v, w):
return str(v)+str(w)
return str(w)+str(v)
def RIP_dfs(J, v, var):
colour = {node: ‘white’ for node in J.graph}
return RIP_dfs_r(J, v, var, colour)
def RIP_dfs_r(J, v, var, colour, state = True):
colour[v] = ‘gray’
if var not in J.clusters[v]:
return False
if var in J.clusters[v]:
return False
for w in J.graph.adj_list[v]:
if colour[w] == ‘white’:
if var in J.separators[sep(v, w)]:
return RIP_dfs_r(J, w, var, colour)
return False
return RIP_dfs_r(J, w, var, colour, False)
return True
def RIP(J):
for v in J.clusters:
for var in J.clusters[v]:
if not RIP_dfs(J, v, var):
return False
return True
############
# Test code
class JoinTree():
def __init__(self, graph, clusters, separators):
self.graph = graph
self.separators = separators
self.clusters = clusters
G = Graph({
‘1’: (‘2’, ),
‘2’: (‘1’, ‘3’, ‘4’),
‘3’: (‘2’, ),
‘4’: (‘2’, ‘5’),
‘5’: (‘4′, ),
’12’: (‘S’, ‘V’),
’23’: (‘V’, ),
’24’: (‘O’, ),
’45’: (‘T’, ),
‘1’: (‘S’, ‘L’, ‘H’, ‘V’),
‘2’: (‘O’, ‘S’, ‘V’),
‘3’: (‘C’, ‘V’),
‘4’: (‘B’, ‘O’, ‘T’),
‘5’: (‘A’, ‘T’),
jt = JoinTree(G, C, S)
print(RIP(jt))
###################
# Hidden test cases
G = Graph({
‘1’: (‘2’, ),
‘2’: (‘1’, ‘3’, ‘4’),
‘3’: (‘2’, ),
‘4’: (‘2’, ‘5’),
‘5’: (‘4′, ),
’12’: (‘S’, ‘V’),
’23’: (‘V’, ),
’24’: (‘O’, ),
’45’: (‘T’, ),
‘1’: (‘S’, ‘L’, ‘H’, ‘V’),
‘2’: (‘O’, ‘S’, ‘V’),
‘3’: (‘C’, ‘V’),
‘4’: (‘B’, ‘O’, ‘T’),
‘5’: (‘A’, ‘T’),
jt = JoinTree(G, C, S)
print(RIP(jt))
if RIP(jt):
marks += 1
G = Graph({
‘1’: (‘2’, ),
‘2’: (‘1’, ‘3’, ‘4’),
‘3’: (‘2’, ),
‘4’: (‘2’, ‘5’),
‘5’: (‘4′, ),
’12’: (‘S’, ),
’24’: (‘O’, ),
’45’: (‘T’, ),
‘1’: (‘S’, ‘L’, ‘H’, ‘V’),
‘2’: (‘O’, ‘S’),
‘3’: (‘C’, ‘V’),
‘4’: (‘B’, ‘O’, ‘T’),
‘5’: (‘A’, ‘T’),
jt = JoinTree(G, C, S)
print(RIP(jt))
if not RIP(jt):
marks += 1
G = Graph({
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com