# Assignment 5
## Question 1
### (a)
$$
\log p(n|N,\alpha) = n \log \alpha + (N-n) \log (1-\alpha) + \log N! – \log n!(N-n)!
$$
To achieve maximum value, set the derivative with respect to $\alpha$ to 0.
$$
\frac {\partial \log p(n|N, \alpha)} {\partial \alpha} = n / \alpha – (N-n) / (1-\alpha) = 0
$$
$$
\alpha = \frac {n} {N}
$$
## (b)
$$
P(\alpha |N, n) \propto p(n|N, \alpha) p(\alpha) \\
\propto p(n|N, \alpha) p(\alpha) \\
\propto \begin{pmatrix} N \\ n \end{pmatrix} \alpha^{n} (1-\alpha)^{N-n} \frac {1} {B(a, b)} \alpha^{a-1} (1-\alpha)^{b-1} \\
\propto \begin{pmatrix} N \\ n \end{pmatrix} \frac {1} {B(a, b)} \alpha^{n + a-1} (1-\alpha)^{N-n + b-1} \\
\propto \alpha^{n + a-1} (1-\alpha)^{N-n + b-1}
$$
adding the normalization factor
$$
P(\alpha |N, n) = \frac {1} {B(n+a, N-n+b)} \alpha^{n + a-1} (1-\alpha)^{N-n + b-1}
$$
This has the same algebraic form with $p(\alpha)$ . Thus $p(\alpha)$ is a conjugate prior.
$$
\frac {\partial \log P(\alpha |N, n)} {\partial \alpha} = (n+a-1) / \alpha – (N-n+b-1) / (1-\alpha) = 0 \\
\alpha = \frac {a+n – 1} {a+b+N – 2}
$$
### (c)
From $b$ , with $a = b$
$$
\alpha = \frac {a- 1 + n} {2(a-1) +N }
$$
When we increase $a$, $\alpha$ is more approximate to $1/2$.
$$
\lim_{a \rightarrow \infty} \alpha = \frac {a- 1 } {2(a-1) } = \frac {1} {2}
$$
When we decrease $a$, $\alpha$ is more approximate to $\frac {n} {N}$ . When $a = 1$, $\alpha = \frac {n} {N}$.
Thus with larger $a$, the prior is more concentrated around $\alpha = \frac {1} {2}$ and the MAP estimate is more approximate to $\frac {1} {2}$.
With smaller $a$, the prior is more distributed, and the MAP estimate is more approximate to the maximum likelihood estimate $\frac {n} {N}$.
## Question 2
### (a)
$$
P(x) = \frac {\exp(-\frac {1} {2} (x-\mu)^T \Sigma^{-1}(x-\mu))} {\sqrt{(2\pi)^n|\Sigma|}} \\
= \frac {\exp(-\frac {1} {2 \sigma^2} (x-\mu)^T (x-\mu))} {\sqrt{(2\pi \sigma^2)^n}} \\
$$
$$
\log L = \sum_{i=1}^{m} \log P(x^{(i)}) \\
= \sum_{i=1}^{m} -\frac {1} {2 \sigma^2} (x^{(i)}-\mu)^T (x^{(i)}-\mu) – \frac {n} {2} \log {(2\pi \sigma^2}) \\
$$
### (b)
$$
\frac {\partial \log L} {\partial \sigma^2} = \frac {1} {2} \sum_{i=1}^{m} \sigma^{-4} (x^{(i)}-\mu)^T (x^{(i)}-\mu) – n \sigma^{-2} = 0
$$
$$
\sigma^2 = \frac {\sum_{i=1}^{m} (x^{(i)}-\mu)^T (x^{(i)}-\mu) } {mn}
$$
Thus, $ \Sigma = \frac {\sum_{i=1}^{m} (x^{(i)}-\mu)^T (x^{(i)}-\mu) } {mn} I $, where $\mu = \frac {1} {m} \sum_{i=1}^m x^{(i)} $
### (c)
$$
P(x) = \frac {\exp(-\frac {1} {2} (x-\mu)^T \Sigma^{-1}(x-\mu))} {\sqrt{(2\pi)^n|\Sigma|}} \\
= \frac {\exp(-\frac {1} {2 } \sum_{i=1}^n \frac {1} {\lambda_i} (x_i – \mu_i)^2 ) } {\sqrt{(2\pi )^n \prod_{i=1}^n \lambda_i}} \\
$$
$$
\log L = \sum_{i=1}^{m} \log P(x^{(i)}) \\
= \sum_{k=1}^{m} -\frac {1} {2 } \sum_{i=1}^n \frac {1} {\lambda_i} (x_i^{(k)} – \mu_i)^2 – \frac {1} {2} (n\log2\pi + \sum_{i=1}^n \log \lambda_i) \\
= -\frac {1} {2 } \sum_{k=1}^{m} (\sum_{i=1}^n \frac {1} {\lambda_i} (x_i^{(k)} – \mu_i)^2 + \sum_{i=1}^n \log \lambda_i + n\log2\pi )
$$
$$
\frac {\partial \log L} {\partial \lambda_i} = -\frac {1} {2 } \sum_{k=1}^{m} (- \lambda_i^{-2} (x_i^{(k)} – \mu_i)^2 + \frac {1} {\lambda_i}) = 0
$$
$$
\lambda_i = \frac {1} {m} \sum_{k=1}^m (x_i^{(k)} – \mu_i)^2
$$
Thus, the maximum likelihood estimate is $\lambda_i = \frac {1} {m} \sum_{k=1}^m (x_i^{(k)} – \mu_i)^2$ where $\mu = \frac {1} {m} \sum_{i=1}^m x^{(i)} $
## Question 3
A has lower bias and higher variance. B has higher bias and lower variance.
If the guns don’t have adjustable signts, I will choose A. Because it has lower bias and will achieve better performance than B.
If the guns have adjustable sights, I will choose B. Adjustable sights can help B to reduce bias, but can’t help A to reduce variance. Thus, with adjustable signts, B is expected to have both lower bias and lower variance.