CS代考 COMP9418_Exam_T3_20201-checkpoint

COMP9418_Exam_T3_20201-checkpoint

T3-2020 Exam¶
COMP9418 – Advanced Topics in Statistical Machine Learning

Copyright By PowCoder代写 加微信 powcoder

7th December, 2020

Before proceeding, please read and acknowledge the following (double-click on this cell and put an X between the brackets [X]):

[ ] I acknowledge that I will complete all of the work I submit for this exam without assistance from anyone else.

Instructions¶
This exam will last for 24 hours, starting 7/12/2020 at 00:00:01 AEST and ending 7/12/2020 at 23:59:59 AEST.
Questions will be answered from 9 am to 9 pm AEST. Questions should be posted in the [WebCMS forum].
You must provide all answers in this Jupyter notebook.
You must use the cells provided to answer the questions. Use markdown cells for textual answers and code cells for programming answers.
Submit this exam by give (command line or WebCMS) before the deadline. If WebCMS submission is slow, or if you are submitting in the last hour, please submit using the give command on the CSE servers (via VLAB or ssh).
The appropriate command is give cs9418 exam *.ipynb. We will not accept late submissions.
The exam will have three parts: Multiple choice questions (20%); Questions that require a textual answer (50%); and, programming questions in Python (30%).
This exam is an open book exam. You are permitted to access papers and books as well as the course materials, including slides and solved tutorials.
You are not permitted to communicate (email, phone, message, talk, etc.) with anyone during the exam, except COMP9418 staff via email or forum.
Do not communicate your exam answers after you finish your exam. Some students may have extended time to complete the exam.
Do not place your exam work in any location accessible to any other person, such as Dropbox and Github.
Ensure that no other person in your household can access your work.
Do not disclose your zpass to any other person. If you have revealed your zpass, you should change it immediately.
We will refer deliberate violations of exam conditions to Student Integrity as serious misconduct.
This exam has nine questions. The total number of marks is 100.
Type your student number and name on the next cell.

Student identification¶

Student ID:

Part 1 [20 marks]¶
Part 1 is composed of four multiple-choice questions of five marks each. To answer each question, double-click the cell with the alternatives and write an X between the [ ] of the chosen option.

This is an example before inserting X

[ ] Alternative one
[ ] Alternative two

This is an example after inserting X

[X] Alternative one
[ ] Alternative two

For all four questions, choose only one among the alternatives.

Question 1 [5 marks]¶
Log-probabilities is the term we use to denote the value of $\log p$ for a probability $p$. The main use of log-probabilities is to avoid underflows that may occur when we multiply a large number of probabilities together. In this case, the multiplication $\prod_i p_i$ becomes a summation since $\sum_i \log p_i = \log \prod_i p_i$. We also note that the maximisation is a valid operation for log probabilities since $\log \max(p_1,…,p_n) = \max(\log p_1, …, \log p_n)$. The use of log-probabilities is a standard technique in the implementation of several algorithms such as Viterbi and MPE-VE. Still, it is not usually used in others such as Variable Elimination (VE) and MAP-VE. Why?

[ ] The Viterbi and MPE-VE are algorithms that involve the multiplication of a large number of probabilities and, therefore, justify the use of log-probabilities.
[ ] The Viterbi and MPE-VE algorithms require multiplication and maximisation of probabilities. However, VE and MAP-VE may also involve sums of probabilities, which is an operation without a counterpart in the log-probability representation.
[ ] The Viterbi and MPE-VE algorithms are algorithms in which we are interested in the assignment with maximum probability, instead of the value of the probabilities. Therefore, in these algorithms, we can use log-probabilities since we do not need to report the probabilities.
[ ] The Viterbi is a particular case of MPE-VE. In turn, MPE-VE is a specific case of MAP-VE. All these algorithms are specialisations of the VE algorithm. Therefore, these are essentially the same algorithm. It makes no difference if we use log-probabilities or probabilities in all of them.
[ ] None of the above alternatives is correct.

Question 2 [5 marks]¶

During the course, we covered a multitude of inference algorithms, from the most straightforward Variable Elimination (VE) to more sophisticated ones such as Iterative Joingraph Propagation (IJGP) and Gibbs samplings. Select the correct alternative regarding the inference algorithms.

[ ] In the course, we covered two exact inference algorithms: variable elimination and the jointree algorithm. The efficiency of these algorithms is tightly coupled. If the best VE order has width $w$, then it is guaranteed the best jointree will also have width $w$. Similarly, if the width of the best jointree is $w$, then the best VE order has also width $w$.
[ ] VE is a simple algorithm, but its complexity is exponential for both best and worst cases. Therefore, this algorithm does not scale to large networks. With an extensive network with hundreds of nodes, it is guaranteed VE will not provide answers in feasible time and we will need to rely on more efficient algorithms, such as approximate inference with sampling.
[ ] We can use the Chebyshev and Hoeffding bounds to compute the number of samples necessary for inference with sampling algorithms. Those bounds are accurate independently of the sampling algorithm: Forward, Rejection, Gibbs sampling, as well as Likelihood Weighting.
[ ] Joingraphs are similar to Jointrees but with relaxed constraints. The inference algorithms for both structures are also very similar. While the inference algorithm for Jointrees converges in a single iteration, the IJGP is guaranteed to converge in one or more iterations.
[ ] None of the above alternatives is correct.

Question 3 [5 marks]¶
In Lecture 16, we studied approaches to learn network structures from data. Choose the incorrect alternative:

[ ] Learning tree structures is more straightforward than learning graph structures because trees have a fixed number of edges.
[ ] A limitation of the tree structure learning methods is that they can infer the presence of an edge but do not infer its direction.
[ ] Learning DAG structures is always better than learning tree structures. The addition of new edges to tree structures will transform the tree into a DAG and will never increase the log-likelihood of the network.
[ ] Model complexity is a technique to avoid overfitting in DAG structure learning. Model complexity is necessary because the addition of an edge never decreases the log-likelihood of the resulting structure.
[ ] Optimal search, as the name suggests, can identify the optimal parent set for a given total order of variables. However, the time complexity is exponential. Therefore, the importance of relying on pruning techniques to reduce running time.

Question 4 [5 marks]¶

In Lecture 4, we discussed the concept of I-MAP, D-MAP and P-MAP for Bayesian networks. We can extend this concept to Markov networks, Jointrees and Joingraphs. Also, in Lecture 9, we discussed that our graphical models could be understood as languages to represent independencies. Now, suppose we have a probability distribution $P$ over five variables $A,B,C,D$ and $E$ such that $P(A,B,C,D,E) = \phi(A,B,C)\phi(B,E)\phi(E,D)\phi(C,D)$. Which probabilistic graphical model is a language able to provide a graph that is a P-MAP for the probability distribution $P$?

[ ] A Bayesian Network.
[ ] A Markov Network and a Bayesian Network.
[ ] A Joingraph and a Markov Network.
[ ] A Joingraph, Markov Network and Bayesian Network.
[ ] A Joingraph, Jointree, Markov Network and Bayesian Network.
[ ] A Markov Network.

Part 2 [50 marks]¶
Part 2 is composed of three open questions. To answer each question, edit the markdown cell after the question statement and insert your answer.

Question 5 [15 marks]¶
As a data scientist, you visit a potential client, a doctor, that poses the following problem to you. He wants to use Machine Learning to improve his understanding of cancer patients. All patients in the database were diagnosed with some variation of the disease. The database has information about the patients (such as age, gender, etc.), medical history (previous and existing conditions) and the disease (cancer type, tumour size, etc.). The doctor has three main requirements:

They want to understand and validate the model;
They want the model to incorporate their knowledge about the domain, such as “lung cancer has a high prevalence for smokers”, but the model should also learn other relationships from data;
They have not a single query they are interested in; they want to probe the model to find “interesting things”.

[5 Marks] What model would you recommend: generative or discriminative. Briefly explain why.
[5 Marks] Briefly compare the suitability of two explainable models for this problem: decision trees and Bayesian networks.
[5 Marks] In the case of Bayesian networks, how would you deal with the requirement of incorporating existing human knowledge in the model? Respond considering the model structure and parameters.

Your answer for 1¶

Your answer for 2¶

Your answer for 3¶

Question 6 [20 marks]¶
The drive in golf is the first shot in playing a hole. If you drive with a 3-wood (a particular type of golf club), there is a 2% risk of a miss (hitting the ball at the wrong angle, so it goes in the wrong direction), and 1/4 of the drives have a length of 180 m, 1/2 are 200 m, and 1/4 have a length of 220 m. You may also use a driver (another type of golf club). This will increase the length by 20 m, but you will also have three times as high a risk of a miss. Both wind and the slope of the hole may affect the result of the drive. The presence of wind doubles the risk of a miss, and the length is affected by 20 m (longer if the wind is from behind and shorter from the front). A downhill slope yields 20 m longer drives and an uphill slope decreases the length of the drive by 20 m. What is the probability of a miss given a shot greater or equal to 260 m?

[5 Marks] Show a Bayesian network structure (graph) for this problem. Briefly explain your network. You will have to make reasonable assumptions when constructing your model.
[5 Marks] Provide a query that solves this problem.
[10 Marks] Answer the query by solving this as a programming exercise, use the tutorial code to make a program that computes the query. If you do not have information about a certain variable, assume it has a uniform distribution.

# Your answer for 1 – Bayesian network structure

# Define your graph here
‘A’: [‘B’],
‘C’: [‘D’],

# Do not modify this cell, it simply plots the graph above

from graphviz import Digraph

dot = Digraph(engine=”neato”, comment=’Direct graph example’)
dot.attr(overlap=”false”, splines=”true”)

for v in graph.keys():
dot.node(v)

for v in graph.keys():
for w in graph[v]:
dot.edge(v, w)

Your answer for 1 – Brief explanation¶

Your answer for 2¶

# Your answer for 3

Question 7 [15 marks]¶
In Lecture 15, we discussed techniques to learn parameters from data, and it became evident that learning in the presence of missing data is significantly more expensive than with complete data. In this question, we ask you to analyse the time complexity of the EM algorithm in more detail.

[10 marks] The Expectation Maximisation (EM) algorithm requires inference on the Bayesian network. Explain how the jointree algorithm can help to improve the running time of the EM algorithm. In particular, queries of the form $P_{\theta^k}(x\textbf{u}|\textbf{d}_i)$ involve setting evidence $\textbf{d}_i$ for every training example $i$. How does it impact the performance of the jointree algorithm in terms of new messages that need to be exchanged?
[5 marks] It is a common practice to use random restarts with EM since the algorithm is sensitive to the initial parameter estimate $\theta^0$. The idea is to run the algorithm multiple times and return the best estimate. Explain how you would implement random restarts with EM. How would you initialise the algorithm, and how would you select the best estimates?

Your answer for 1¶

Your answer for 2¶

Part 3 [30 marks]¶
Part 3 is composed of two programming questions of 15 marks each. Use the code cell after each question statement to enter your answer. You can use the code of the tutorials in your answers.

Question 8 [15 marks]¶
Lecture 9 presented the concept of separation, an independence test for Markov networks. The idea is similar, but simpler than d-separation since Markov networks do not have “convergent valves”. The test can be summarized in a single sentence:

Let $\textbf{X}$, $\textbf{Y}$, and $\textbf{Z}$ be three disjoint sets of nodes in a graph $G$. We say that $\textbf{X}$ is separated from $\textbf{Y}$ given $\textbf{Z}$, written $sep_G(\textbf{X},\textbf{Z},\textbf{Y})$, if and only if every path between a node in $\textbf{X}$ and a node in $\textbf{Y}$ pass through a node in $\textbf{Z}$.

An efficient separation test can be implemented by pruning the edges of $\textbf{Z}$ and testing for connectivity between nodes in $\textbf{X}$ and $\textbf{Y}$.

Implement an efficient separation test for Markov networks. The function separation(G, X, Z, Y) returns true if $\textbf{X}$ is separated of $\textbf{Y}$ given $\textbf{Z}$ in the graph $G$.

# Write our answer for Question 8 here

def separation(G, X, Z, Y):
return None

############
# Test code

‘A’: [‘B’],
‘B’: [‘A’, ‘C’, ‘D’, ‘F’],
‘C’: [‘B’],
‘D’: [‘B’, ‘F’],
‘E’: [‘F’],
‘F’: [‘B’, ‘D’, ‘E’, ‘G’],
‘G’: [‘F’],

print(separation(graph, [‘A’], [‘D’], [‘G’]))
print(separation(graph, [‘A’], [‘C’], [‘G’]))
print(separation(graph, [‘A’], [‘F’], [‘G’]))
print(separation(graph, [‘A’], [‘D’, ‘E’], [‘G’]))
print(separation(graph, [‘A’, ‘D’], [‘C’, ‘G’], [‘E’, ‘F’]))
print(separation(graph, [‘A’, ‘D’], [‘F’], [‘E’, ‘G’]))

Question 9 [15 marks]¶
In Lecture 16, we discussed the importance of measuring model complexity. A common way to do so is to count the number of independent parameters in a Bayesian network.

In this exercise, you will write a function dimension(G, outcomeSpace) which takes a DAG G and its associated outcomeSpace, and returns the dimension of G.

# Write our answer for Question 8 here

def dimension(G, outcomeSpace):
return None

############
# Test code

‘L’: [‘S’, ‘V’],
‘H’: [‘S’, ‘V’],
‘S’: [‘O’],
‘V’: [‘C’, ‘O’],
‘O’: [‘B’],
‘A’: [‘T’],
‘T’: [‘B’],

outcomeSpace = dict(
C=(0,1,2),
O=(0,1,2),
B=(0,1,2),

print(“The dimension of the ICU network is”, dimension(graph, outcomeSpace))

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com