代写 C graph 1

1
EE5434 Machine Learning for Signal Processing Application
Homework 3
Please Name your Submitted Homework as: FamilyName-FirstName.pdf. For example, Sun-Yanni.pdf. Please only submit one file containing all your work. Only pdf file will be accepted.
Problem 1 (4 pts)
In the following equation,
set δ = 0.03 and let,
􏰄1 2M
Eout(g) ≤ Ein(g) + 2N ln δ (1)
􏰄1 2M ε(M,N,δ)= 2Nln δ
(2)
(a) M = 1, how many examples do we need to make ε ≤ 0.05?
(b) M = 1, 000, how many examples do we need to make ε ≤ 0.05?
(c) M = 20, 000, how many examples do we need to make ε ≤ 0.05?
2 Problem 2 (3 pts)
Compute the maximum number of dichotomies, mH(N), for the following learning model, and consequently compute dvc, the V C dimension.
Two concentric spheres in Rd : H contains the functions which are +1 for a ≤ 􏰃x21 + . . . + x2d ≤ b.
3 Problem 3 (3 pts)
For an H with dvc = 10, what sample size do you need (as prescribed by the generalization bound) to have a 95% confidence that your generalization error is at most 0.01 ?
1

4 Problem 4 (10 pts)
There are a number of bounds on the generalization error ε, all holding with probability at least 1−δ.
(a) Original V C-bound:
􏰄 8 4mH(2N)
ε􏰆 Nln N (3)
(b) Rademacher Penalty Bound:
􏰄2ln(2NmH(N)) 􏰄2 1 1
ε≤ N + Nlnδ+N (4) (c) Parrondo and Van den Broek:
(d) Devroye:
􏰅1􏰀 6mH(2N)􏰁
ε ≤ N 2ε + ln δ (5)
􏰅1􏰀 4mH(N2)􏰁
ε ≤ 2N 4ε(1 + ε) + ln δ (6)
Note that (c) and (d) are implicit bounds in ε. Fix dvc = 50 and δ = 0.05 and plot these bounds as a function of N. Which is best?
5 Problem 5 (10 pts)
Given hypothesis set H, its break point is k. Prove that k + 1 is also a break point based on the definition of break point. (Citing any existing theorem for this proof is not allowed.)
2

6 Problem 6 (10 pts)
H consists of all hypothesis in two dimensions h : R2 → {−1, +1} that are positive inside some square boxes and negative elsewhere. e.g (Figure 1).
(a) What is mH(4)?
(b) Draw all dichotomies for the four data points you used to compute mH(4)
Figure 1: Problem 6 Graph
7 Problem 7 (10 pts)
If we are learning from +/- data to predict a noisy target P(y|x) with candidate hypothesis h, show that the maximum likelihood method (refer to note 7) reduces to the task of finding h that minimizes:
Ein(w)=
􏰂N 1 1
n=1
[yn =+1]lnh(xn)+[yn =−1]ln1−h(xn) (7)
3