CSC311 Fall 2021 Tutorial 3
Tutorial 3 Exercises
1. Bias, Variance, and Bayes Error. The purpose of this exercise is to show a simple example where you can compute the bias, variance, and Bayes error of a predictor. For this question, we assume we have N scalar-valued observations {x(i)}Ni=1 sampled independently from a Gaussian distribution N(x;μ, 2) with known variance 2 and unknown mean μ. We’d like to estimate the mean parameter μ, or equivalently, choose a μˆ which minimizes the squared error risk E[(x μˆ)2].
We’ll introduce the Gaussian distribution properly in a later lecture, but hopefully you’ve seen it before in a probability course. It is a bell-shaped distribution whose density is:
Copyright By PowCoder代写 加微信 powcoder
p(x;μ, 2)= p1 exp✓ (x μ)2◆. 2⇡ 2 2
The details of the Gaussian distribution (such as the density) aren’t important for this exer- cise. The important facts are that E[x] = μ and Var(x) = 2).
We will estimate the unknown mean paramter μ by taking the empirical mean, or average,
of the observations:
μˆ = N x ( i ) .
This is equivalent to the maximum likelihood estimate, but you don’t need to know that yet.
(It’s covered in a later lecture.)
The squared error risk E[(x μˆ)2] of this estimator can be decomposed into terms for bias, variance, and Bayes error, exactly following our proof from Lecture 5. (Here, x plays the role of t, and μˆ plays the role of y. The Bayes optimal prediction, corresponding to y? from lecture, is E[x] = μ.) Your job is to determine each of the three terms.
(a) Bayes error: E[(x μ)2]
(b) Bias: (E[μˆ] μ)2
(c) Variance: Var(μˆ)
CSC311 Fall 2021 Tutorial 3
2. Information Theory. The goal of this question is to help you become more familiar with the basic equalities and inequalities of information theory. They appear in many contexts in machine learning and elsewhere, so having some experience with them is quite helpful. We review some concepts from information theory, and ask you a few questions.
Recall the definition of the entropy of a discrete random variable X with probability mass
functionp:
✓ 1 ◆ p(x)log2 p(x) .
Here the summation is over all possible values of x 2 X, which (for simplicity) we assume is finite. For example, X might be {1, 2X, . . . , N }. Recall also the definition of conditional entropy:
H(Y |X) = p(x)H(Y |X = x). x2X
(a) Prove that the entropy H(X) is non-negative.
(b) Prove the Chain Rule for entropy:
H(X, Y ) = H(X|Y ) + H(Y ) = H(Y |X) + H(X)
(c) Prove that H(X, Y ) H(X). (Hint: this follows fairly directly from parts (a) and (b).)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com