End of year Examinations, 2021 STAT318/STAT462-21S2 (C,D)
UNIVERSITY OF CANTERBURY
END OF YEAR EXAMINATIONS
Prescription Number(s): STAT318/STAT462-21S2 (C,D) Paper Title: Data Mining
Copyright By PowCoder代写 加微信 powcoder
Time allowed: TWO HOURS Number of Pages: 5
Please read the following instructions carefully: Instructions:
• Answer all FOUR questions.
• Each question weights 25 points. There is a total of 100 marks.
• Show all your working.
• Explain your reasoning.
• You may use your prepared material, course notes and/or textbooks.
• You may not communicate with anyone, in any way, for the duration of the exam. • Upload your solutions to the Learn folder.
Declaration:
By uploading solutions to this exam to the Exam Learn Dropbox folder you declare that you have complied with all the instructions above. In particular, you declare that the submitted work is yours alone and you have not communicated with anyone in any way for the duration of the exam.
Page 1 of 5 TURN OVER
End of year Examinations, 2021 STAT318/STAT462-21S2 (C,D)
1. Suppose that the following transactions recorded from a clothing store ”Jandals and Gumboots”. Please assume each transaction was recorded only once.
Transaction
{hat, jandals}
{hat, socks, jandals}
{hat, socks, pants}
{hat, socks, jandals, T-shirt} {socks, pants}
{jandals, T-shirt} {hat, pants, T-shirt}
(a) Compute the support of {hat},{T-shirt}, {hat, socks}, and {hat, socks, T-shirt}.
(b) Compute the confidence of the association rules and describe the meaning of con- fidence in each case:
{hat, socks} → {T-shirt}; and {T-shirt} → {hat, socks}.
(c) Find the 3-itemset(s) with the largest support.
(d) Suppose that the minimum support is 4/8. Find frequent 1-itemset(s).
Page 2 of 5
End of year Examinations, 2021 STAT318/STAT462-21S2 (C,D)
Nearest Neighbour Averaging
The above picture shows a sample of n = 1000 points (red circles) and we have per-
formed a k nearest neighbour regression (kNN) with k = 500 (blue line).
(a) Is the blue line a good fit? Justify your answer. If your answer is ”no”, how should
we change k to get a better fit?
(b) Would we under-fit or over-fit with k = 2? Justify your answer. Why is that a problem?
(c) kNN works well if the sample size is big and the dimension p is small. Does kNN also work well for larger p with the same sample size? Justify your answer.
(d) For assessing model accuracy, we can use the mean-squared error (MSE). Why do we need a training and a testing data set, and how should we divide our data?
(e) Is it a good sign, if our training MSE is far below the irreducible error? Justify your answer.
Page 3 of 5 TURN OVER
−2.5 −2.0 −1.5 −1.0
−0.5 0.0 0.5 1.0
End of year Examinations, 2021 STAT318/STAT462-21S2 (C,D)
Explain briefly how the error rate is defined in classification.
(b) Is there an ideal classifier? If yes, describe it. How can we compute it in real-world problems?
(c) Assume we would like to predict the COVID-19 average daily vaccination rate based on TV advertisements and we are building a model (budget spent on TV ads as predictor and vaccination rate as response). When assessing the overall model accuracy, we often use the R-squared (R2) statistic. Assume for our model we receive the R2 value of 0.7. How do we interpret this result?
(d) When looking for the best fit for a sample data, having different options (models) with different complexity, does it make sense taking the model with the highest R2 value? Justify your answer.
Page 4 of 5
End of year Examinations, 2021 STAT318/STAT462-21S2 (C,D)
4. (a) Explain briefly the advantages and disadvantages of a single decision tree model.
(b) This question uses the CART partition below. The points in each box denote the training data and the numbers next to them are their response values.
i. Sketch the corresponding decision tree.
ii. Predict the response for the following input (4, −0.5)
iii. What is the MSE on the training set for this tree?
(c) Explain briefly difference between bagging and random forest.
End of Examination
Page 5 of 5
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com