CS代考计算机代写 flex algorithm AI Biologically Inspired Methods

Biologically Inspired Methods

Nature-Inspired Learning Algorithms (7CCSMBIM)
Tutorial 2: Solutions
1

Q1. What are the advantages and disadvantages of gradient descent method?
2

Q1. What are the advantages and disadvantages of gradient descent method?
3

Q1. What are the advantages and disadvantages of gradient descent method?

https://www.cs.toronto.edu/~frossard/post/linear_regression/
4

Q1. What are the advantages and disadvantages of gradient descent method?

Q1. What are the advantages and disadvantages of gradient descent method?
Advantages:
Simple
Robust and quick convergence
Tractable solution

Disadvantages:
Works with gradient
Does not guarantee global minimum
Does not work well with discrete variables
Sensitive to initial guess
Trapped in local minimum
6

Q2. Show how gradient descent method works using pseudo code
7

\noindent $\mathbf{x}^* = \left( \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \right)^{-1} \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} -4 \\ 4.5 \end{array} \right]$ \\~\\~\newline
%
\textbf{Verification:} $\mathbf{A}\mathbf{x}^{*}-\mathbf{B} = \mathbf{0}$?\\
$\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \mathbf{x}^* – \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

B
G
W
M

x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘\itx’);
ylabel(‘\ity’);

plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
9

B
G
W
M

B
G
W
M
R

\begin{align*}
&\mathbf{B}: f(0, 0) = 0\\
&\mathbf{G}: f(1, 2) = 6\\
&\mathbf{W}: f(3, 4) = 26~(\text{before replacement})\\
&\mathbf{R}: f(-2, -2) = 8\\
&\mathbf{C}: f(-0.75, -0.5) = 1.0625
\end{align*}
12



k = 0:
k = 1:
k = 2:


a
b
c
d
e
f

\noindent \textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\
\noindent 1$^{st}$ iteration: $\mathbf{z}_{1} = \left[ \begin{array}{c} 7 \\ 8 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 7 – 1 \\ 2 \times 8 + 1 \end{array} \right] = \left[ \begin{array}{c} 5.7 \\ 6.3 \end{array} \right]$;

$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_k + 1 \end{array} \right]$

$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_2 + 1 \end{array} \right]$

\noindent $f(x, y)$ = 72.7800.
\\~\\

\noindent 2$^{nd}$ iteration: $\mathbf{z}_{2} = \left[ \begin{array}{c} 5.7 \\ 6.3 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 5.7 – 1 \\ 2 \times 6.3 + 1 \end{array} \right] = \left[ \begin{array}{c} 4.66 \\ 4.94 \end{array} \right]$;

\noindent $f(x, y)$ = 46.3992. \\~\\
\noindent 3$^{rd}$ iteration: $\mathbf{z}_{3} = \left[ \begin{array}{c} 4.66\\ 4.94 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 4.66 – 1 \\ 2 \times 4.94 + 1 \end{array} \right] = \left[ \begin{array}{c} 3.828\\ 3.852 \end{array} \right]$;

\noindent $f(x, y)$ = 29.5155.

\frac{df(x,y)}{dx}\\
\frac{df(x,y)}{dy}



k = 0:
k = 1:
k = 2:


$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_k + 1 \end{array} \right]$

$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_2 + 1 \end{array} \right]$

\noindent $f(x, y)$ = 72.7800.
\\~\\

\noindent $f(x, y)$ = 29.5155.

\frac{df(x,y)}{dx}\\
\frac{df(x,y)}{dy}

Initial guess
k = 0
k = 1
k = 2

[X,Y] = meshgrid(1:0.5:10,1:0.5:10);
Z = (X-1).*X + (Y+1).*Y;
mesh(X,Y,Z);
alpha0.5;
holdon;
plot3(7, 8, 114,’ro’,’MarkerSize’, 12,’linewidth’, 3);
plot3(5.7, 6.3, 72.78,’rx’,’MarkerSize’, 12,’linewidth’, 3);
plot3(4.66, 4.94, 46.3992,’rx’,’MarkerSize’, 12,’linewidth’, 3);
plot3(3.828, 3.852, 29.5155,’rx’,’MarkerSize’, 12,’linewidth’, 3);
xlabel(‘\itx’);
ylabel(‘\ity’);
zlabel(‘{\itf}(\itx,\ity)’);

\fbox{$h_k = \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)}$}

$f(x, y) = \frac{1}{2} \begin{bmatrix} x \\ y \end{bmatrix}^T \begin{bmatrix} 2&0 \\ 0&2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} – \begin{bmatrix} 1&-1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$

\textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\

\fbox{$h_k = \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)}$}
18

\textbf{Update rule:} $\mathbf{z}_{k+1} = \mathbf{z}_k – \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)} \triangledown f(\mathbf{z}_k)$

\textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\

Randomly pick an initial condition as $\mathbf{z}_k = \left[ \begin{array}{c} 1 \\ -2 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_k + \frac{1}{2} \triangledown f(\mathbf{z}_k) \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_k-1 \\ 2y_k+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 1-1 \\ 2\times-2+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 1 \\ -3 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}

Run another iteration (e.g., $k+1 \rightarrow k+2$) by taking $\mathbf{z}_{k+1} = \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_{k+1} + \frac{1}{2} \triangledown f(\mathbf{z}_{k+1}) \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_{k+1}-1 \\ 2y_{k+1}+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 0.5-1 \\ 2\times-0.5+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}



[a, b]
[c, d]
e
[f, g]
h
[i, j]

$\mathbf{x}_{k+1} = \left[ \begin{array}{c} -1 \\ -2 \end{array} \right] + \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right]\left[ \begin{array}{c} 0.5 \\ 0.5 \end{array} \right]$

$\mathbf{x}_{k+1} = \left[ \begin{array}{c} -0.5 \\ -1.5 \end{array} \right] + \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right]\left[ \begin{array}{c} 0.5 \\ 0.5 \end{array} \right]

Approximate



Decision variables

y = g(x_1, x_2)

\hat{y} = w_1 x_1 + w_2 x_2 + b

Ideal case: $y = \hat{y}$, i.e., $y – \hat{y} = 0$

Minimise MSE subject to w1, w2 and b

Simplified notations:

\frac{(\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + \cdots + (\hat{y}_M – y_M)^2 }{M}

y(M) \leftarrow y_m, \hat{y}(M) \leftarrow \hat{y}_m

= \frac{(\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + \cdots + (\hat{y}_M – y_M)^2 }{M}

= \frac{\big( (w_1x_1(1) +w_2x_2(1) + b) – y_1 \big)^2 + \big((w_1x_1(2) +w_2x_2(2) + b) – y_2 \big)^2 + \cdots + \big((w_1x_1(M) +w_2x_2(M) + b) – y_M \big)^2 }{M}

\min_{w_1, w_2, b} \frac{1}{M} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)^2

y(M) \leftarrow y_m, \hat{y}(M) \leftarrow \hat{y}_m

Remark: The ideal set of $w_1$, $w_2$ and $b$ will lead to the cost $f(w_1, w_2, b) = 0$.

\textbf{Update rule:} $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$ where $\mathbf{z}_k = \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right]$
\begin{align*}
\triangledown f(\mathbf{z}_k) &= \left[ \begin{array}{c} \frac{\partial f(\mathbf{z}_k)}{\partial w_1 } \\ \\ \frac{\partial f(\mathbf{z}_k)}{\partial w_2} \\ \\ \frac{\partial f(\mathbf{z}_k)}{\partial b} \end{array} \right] = \left[ \begin{array}{c} \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right]
\end{align*}

f(w_1, w_2, b) = f(\mathbf{z}) = \frac{1}{M} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)^2

\textbf{Update rule:}\\ \begin{align*} \mathbf{z}_{k+1} &= \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)\\ \Rightarrow \left[ \begin{array}{c} w_{1_{k+1}} \\ w_{2_{k+1}} \\ b_{k+1} \end{array} \right] &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – h_k \left[ \begin{array}{c} \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \\ &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – \frac{2h_k}{M} \left[ \begin{array}{c} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \end{align*}
27

Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $\Rightarrow M = 3$.

$\mathbf{z}_0 = \mathbf{z}_0 = \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] \Rightarrow
w_1 = w_2 = b = 1$
\begin{align*}
\hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 \times 1) + (1 \times -2) + 1 = 0\\
\hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 \times 2) + (1 \times 4) + 1 = 7\\
\hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 \times 3) + (1 \times 0) + 1 = 4
\end{align*}

\begin{align*}
MSE &= \frac{1}{M} \big( (\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + (\hat{y}_3 – y_3)^2 ) \big)\\
&= \frac{1}{3} \big( (0 – 3)^2 + (7 – (-1))^2 + (4 – 5)^2 ) \big)\\
&= 24.6667
\end{align*}

Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $\Rightarrow M = 3$.

Ingredient 1

Ingredient 2

Others

Food Product

58
Q16. Explain how “Recursive least-squares” works.

A least-squares problem is described as $\displaystyle \min_{\mathbf{x}} f(\mathbf{x}) = \mid\mid \mathbf{A}\mathbf{x} – \mathbf{B} \mid\mid_2^2$. Its solution is given as $\mathbf{x} = \big( \mathbf{A}^T\mathbf{A} \big)^{-1} \mathbf{A}^T \mathbf{B}$.\\

Denote the $i$-th row of $\mathbf{A}$ as $\mathbf{a}_i^T$ and the $i$-th row of $\mathbf{B}_i$ as $b_i$.\\ The solution $\mathbf{x}$ can be represented in row form as follows: $$\mathbf{x} = \Big( \displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T \Big)^{-1} \sum_{i=1}^m b_i \mathbf{a}_i.$$

For example, $$\mathbf{A} = \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right] = \left[ \begin{array}{c} \mathbf{a}_1^T \\ \mathbf{a}_2^T \\ \vdots \\ \mathbf{a}_m^T \\ \end{array} \right],$$ $$\mathbf{B} = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array} \right].$$
58

\mathbf{A}^T\mathbf{A}

\displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T

$\left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right]$

$\left[ \begin{array}{cccc} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{array} \right]$

\left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array} \right]

$\mathbf{a}_1\mathbf{a}_1^T + \mathbf{a}_2\mathbf{a}_2^T + \cdots + \mathbf{a}_m\mathbf{a}_m^T$

60
What happen if a new sample is coming in?

$$\mathbf{A} = \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right] = \left[ \begin{array}{c} \mathbf{a}_1^T \\ \mathbf{a}_2^T \\ \vdots \\ \mathbf{a}_m^T \\ \end{array} \right], \qquad \mathbf{B} = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array} \right].$$

$\left[ \begin{array}{cccc} a_{m+1,1} & a_{m+1,2} & \vdots & a_{m+1,n} \end{array} \right] \quad = \quad \mathbf{a}_{m+1}^T$

$b_{m+1}$

The new solution can be written as $$\mathbf{x}_{new} = \Big( \displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T \Big)^{-1} \big( \sum_{i=1}^m b_i \mathbf{a}_i + b_{m+1} \mathbf{a}_{m+1} \big).$$

At the time that you have $m$ samples, recall that the solution is: $$\mathbf{x} = \mathbf{P}(m)^{-1} \mathbf{q}(m)$$
where $$\mathbf{P}(m) = \mathbf{A}^T\mathbf{A} = \displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T $$ and $$\mathbf{q}(m) = \mathbf{A}^T \mathbf{B} = \sum_{i=1}^m b_i \mathbf{a}_i.$$

With the new sample $\mathbf{a}_{m+1}^T$ and and $b_{m+1}$, the solution is: $$\mathbf{x}_{new} = \mathbf{P}(m+1)^{-1} \mathbf{q}(m+1)$$ where $$\mathbf{P}(m+1) = \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T = \mathbf{P}(m) + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T $$ and $$\mathbf{q}(m+1) = \sum_{i=1}^m b_i \mathbf{a}_i + b_{m+1} \mathbf{a}_{m+1}.$$
61

62
When new sample comes in, do we need to compute the inverse of P(m+1)?

Can we use the inverse of P(m) to obtain P(m+1)?

Any mathematical trick?

\noindent \textit{Rank one update formula:} $$\big( \mathbf{P} + \mathbf{a}\mathbf{a}^T \big)^{-1} = \mathbf{P}^{-1} – \frac{1}{1+\mathbf{a}^T \mathbf{P}^{-1} \mathbf{a}} \big( \mathbf{P^{-1} \mathbf{a}} \big) \big( \mathbf{P^{-1} \mathbf{a}} \big)^T$$

\noindent\textit{Remark:} Rank one update formula is valid when $\mathbf{P} = \mathbf{P}^T$, and $\mathbf{P}$ and $\mathbf{P} + \mathbf{a}\mathbf{a}^T$ are both invertible.
62

\noindent \textit{Rank one update formula:} $$\big( \mathbf{P} + \mathbf{a}\mathbf{a}^T \big)^{-1} = \mathbf{P}^{-1} – \frac{1}{1+\mathbf{a}^T \mathbf{P}^{-1} \mathbf{a}} \big( \mathbf{P}^{-1} \mathbf{a} \big) \big( \mathbf{P}^{-1} \mathbf{a} \big)^T$$

We have:
$$\mathbf{P}(m+1) = \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T = \mathbf{P}(m) + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T$$

$\mathbf{P}(m+1)^{-1} \equiv \Big( \mathbf{P}(m) + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T \Big)^{-1}$.

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

f(x) =|| Ax�B ||22 where A =

1 2

3 4

�
and

B =


5

�
. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2]
1

0 0.5 1 1.5 2 2.5 3

0.5

1.5

2.5

3.5

00.5 11.5 22.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2 (Suggested Solutions)

Q1. Answer can be found in lecture notes.

Q2. Answer can be found in lecture notes.

Q3. x

⇤
=


1 2

3 4

�T 
1 2

3 4

�!�1 
1 2

3 4

�T 
5

�
=


�4
4.5

�

Verification:


1 2

3 4

�
x

⇤ �

5

�
=


0

�

Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)

b. M =

B+G
2

(0,0)+(1,2)
2

= (0.5, 1); f(0.5, 1) = 1.75.

c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.
d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R. C = W+M 2 = (�2,�2)+(0.5,1) 2 = (�0.75,�0.5) and f(C) = 1.0625. As f(C) < f(W), W is replaced with C. The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2) Q5. Let z =  x y � . Of(z) =  2x� 1 2y + 1 � . According to the update rule, zk+1 = zk � hkOf(zk), we have 1 st iteration: z1 =  7 8 � � 0.1  2⇥ 7� 1 2⇥ 8 + 1 � =  5.7 6.3 � ; f(x, y) = 72.7800. 2 nd iteration: z2 =  5.7 6.3 � � 0.1  2⇥ 5.7� 1 2⇥ 6.3 + 1 � =  4.66 4.94 � ; f(x, y) = 46.3992. 3 rd iteration: z3 =  4.66 4.94 � � 0.1  2⇥ 4.66� 1 2⇥ 4.94 + 1 � =  3.828 3.852 � ; f(x, y) = 29.5155. Q6. a. Update rule: Of(z) =  2x� 1 2y + 1 � where zk =  xk yk � . f(z) = 1 2 z T Qz� bTz where Q =  2 0 0 2 � and b T = ⇥ �1 1 ⇤ . 1 df(x, y) dx df(x, y) dy 0 10 50 8 10 100 f( x, y) 6 8 150 y 6 x 4 200 4 2 2 0 0 0 10 50 8 10 100 f ( x , y ) 6 8 150 y 6 x 4 200 4 2 2 0 0 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A =  1 2 3 4 � and B =  5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A =  1 2 3 4 � and B =  5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A =  1 2 3 4 � and B =  5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 (Suggested Solutions) Q1. Answer can be found in lecture notes. Q2. Answer can be found in lecture notes. Q3. x ⇤ =  1 2 3 4 �T  1 2 3 4 �!�1  1 2 3 4 �T  5 6 � =  �4 4.5 � Verification:  1 2 3 4 � x ⇤ �  5 6 � =  0 0 � Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0; f(1, 2) = (1� 1)1 + (2 + 1)2 = 6; f(3, 4) = (3� 1)3 + (4 + 1)4 = 26. B: (0, 0); G: (1, 2); W: (3, 4) b. M = B+G 2 = (0,0)+(1,2) 2 = (0.5, 1); f(0.5, 1) = 1.75. c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8. d. As f(R) > f(G), Case (ii) is performed.

A
A

A
C

w
H

i
c

n
V

F
d

S
+

N
A

F
J

1
E

X
b

W
6

a
9

V
H

X
y

5
b

t
7

i
w

l
i

Q
I

+
i

K
I

v
v

i
o

Y
F

V
o

u
m

U
y

v
W

k
H

J
5

M
4

M
5

H
G

k
D

/
p

i
+

y
/

2
U

k
t

s
t

p
9

E
A

8
M

H
M

4
9

9
2

P
u

j
T

L
B

t
f

G
8

P
4

6
7

s
L

j
0

Z
X

l
l

t
b

G
2

/
v

X
b

R
n

N
z

6
1

q
n

u
W

L
Y

Z
a

l
I

1
W

1
E

N
Q

o
u

s
W

u
4

E
X

i
b

K
a

R
J

J
P

A
m

u
j

u
r

4
z

c
P

q
D

R
P

5
Z

U
p

M
u

w
n

d
C

R
5

z
B

k
1

V
h

o
0

n
3

f
j

v
c

k
v

K
H

7
C

M
Y

S
x

o
q

z
0

q
z

K
o

I
I

x
w

x
G

U
Z

J
d

Q
o

P
q

l
g

A
m

E
I

B
Y

Q
o

h
6

/
i

7
6

s
5

W
9

D
2

a
q

P
X

D
t

5
a

P
1

Q
P

9
u

d
s

f
n

v
f

/
0

S
l

3
c

a
g

2
f

I
6

3
h

Q
w

T
/

w
Z

a
Z

E
Z

L
g

b
N

p
3

C
Y

s
j

x
B

a
Z

i
g

W
v

e
O

M
t

M
v

q
T

K
c

C
a

w
a

Y
a

4
x

o
+

y
O

j
r

B
n

q
a

Q
J

6
n

4
5

3
X

8
F

P
6

w
y

h
D

h
V

9
k

k
D

U
/

X
f

j
J

I
m

W
h

d
J

Z
J

1
2

w
L

F
+

H
6

v
F

/
8

V
6

u
Y

m
P

+
i

W
X

W
W

5
Q

s
p

d
G

c
S

7
A

p
F

A
f

E
4

Z
c

I
T

O
i

s
I

Q
y

x
e

2
s

w
M

b
U

X
t

H
Y

k
z

f
q

J
f

j
v

v
z

x
P

r
o

O
O

f
9

A
5

u
A

x
a

J
6

e
z

d
a

y
Q

H
f

K
d

7
B

G
f

H
J

I
T

c
k

4
u

S
J

c
w

5
9

h
h

j
n

A
S

9
9

Q
d

u
6

l
7

/
2

J
1

n
V

n
O

N
n

k
D

9
/

E
v

j
G

T
Z

f
w

=
=

< / l a t e x i t >

Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2 (Suggested Solutions)

Q1. Answer can be found in lecture notes.

Q2. Answer can be found in lecture notes.

Q3. x⇤ =


1 2
3 4

�T 
1 2
3 4

�!�1 
1 2
3 4

�T 
5
6

�
=


�4
4.5

�

Verification:


1 2
3 4

�
x⇤ �


5
6

�
=


0
0

�

Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)

b. M = B+G
2

= (0,0)+(1,2)
2

= (0.5, 1); f(0.5, 1) = 1.75.

c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.
d. As f(R) > f(G), Case (ii) is performed.

If rand < Threshold Then xk+1 xk; Else If f (xbest) > f (xk+1) Then xbest xk+1;
End

k k+1;
end

return xk and xbest;

Table 3: Pseudo Code of Random Walk Optimisation.

Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2018-19 46 / 68

TraditionalNumericalMethods:RandomWalkAlgorithm:RandomWalkOptimisation
input:f(x):¬
n
!¬;x
0
:aninitialsolution
output:x
⇤
,alocalminimumofthecostfunctionf(x)
k 0;x
best
=x
0
;Threshold;
whileSTOP-CRITandkf(x
k
)
Ifrandf(x
k+1
)Thenx
best
x
k+1
;
End
k k+1;
end
returnx
k
andx
best
;
Table3:PseudoCodeofRandomWalkOptimisation.
DrH.K.Lam(KCL) Optimisation BiologicallyInspiredMethods2018-1946/68

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

f(x) =|| Ax�B ||22 where A =

1 2

3 4

�
and

B =


5

�
. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2]
1

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

where Of(zk) =

2xk � 1
2yk + 1

�
.

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

⇥
2xk � 1 2yk + 1

⇤ 
2xk � 1
2yk + 1

�

⇥
2xk � 1 2yk + 1

⇤ 
2 0

0 2

� 
2xk � 1
2yk + 1

�

2
k � 4xk + 2 + 4y

2
k + 4yk

2
k � 8xk + 4 + 8y

2
k + 8yk

b. Update rule: zk+1 = zk �
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

Of(zk)

Randomly pick an initial condition as zk =


1

�2

�
.

zk+1 = zk +
1

Of(zk)


1

�2

�
�


2xk � 1
2yk + 1

�


1

�2

�
�


2⇥ 1� 1
2⇥�2 + 1

�


1

�2

�
�


1

�3

�


0.5

�0.5

�

Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =


0.5

�0.5

�
.

zk+1 = zk+1 +
1

Of(zk+1)


0.5

�0.5

�
�


2xk+1 � 1
2yk+1 + 1

�


0.5

�0.5

�
�


2⇥ 0.5� 1
2⇥�0.5 + 1

�


0.5

�0.5

�
�


0

�


0.5

�0.5

�

Q7. Update rule: xk+1 = xk + Dkhk where xk =


xk

�
, Dk =


1 0

0 1

�
and hk =


0.5

0.5

�
.

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

where Of(zk) =

2xk � 1
2yk + 1

�
.

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

⇥
2xk � 1 2yk + 1

⇤ 
2xk � 1
2yk + 1

�

⇥
2xk � 1 2yk + 1

⇤ 
2 0

0 2

� 
2xk � 1
2yk + 1

�

2
k � 4xk + 2 + 4y

2
k + 4yk

2
k � 8xk + 4 + 8y

2
k + 8yk

b. Update rule: zk+1 = zk �
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

Of(zk)

Randomly pick an initial condition as zk =


1

�2

�
.

zk+1 = zk +
1

Of(zk)


1

�2

�
�


2xk � 1
2yk + 1

�


1

�2

�
�


2⇥ 1� 1
2⇥�2 + 1

�


1

�2

�
�


1

�3

�


0.5

�0.5

�

Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =


0.5

�0.5

�
.

zk+1 = zk+1 +
1

Of(zk+1)


0.5

�0.5

�
�


2xk+1 � 1
2yk+1 + 1

�


0.5

�0.5

�
�


2⇥ 0.5� 1
2⇥�0.5 + 1

�


0.5

�0.5

�
�


0

�


0.5

�0.5

�

Q7. Update rule: xk+1 = xk + Dkhk where xk =


xk

�
, Dk =


1 0

0 1

�
and hk =


0.5

0.5

�
.

f(�0.5,�1.5) = 1.5

< l a t e x i t s h a 1 _ b a s e 6 4 = " 5 B n V 9 j H Z b 8 M Q G I h U r q Y 2 q C s O W U 8 = " >

A
A

A
B

/
H

i
c

b
V

D
L

S
g

M
x

F
M

3
U

V
x

1
f

o
1

2
6

C
R

a
l

g
h

1
m

p
M

V
u

h
K

I
b

l
x

X
s

A
9

q
h

Z
N

J
M

G
5

p
5

k
G

S
E

Y
a

i
/

4
C

e
4

c
a

G
I

W
z

/
A

T
3

A
n

+
B

v
u

T
R

8
L

b
T

1
w

u
Y

d
z

7
i

U
3

x
4

0
Y

F
d

K
y

P
r

X
M

0
v

L
K

6
l

p
2

X
d

/
Y

3
N

r
e

M
X

b
3

G
i

K
M

O
S

Z
1

H
L

K
Q

t
1

w
k

C
K

M
B

q
U

s
q

G
W

l
F

n
C

D
f

Z
a

T
p

D
i

/
H

f
v

O
W

c
E

H
D

4
E

Y
m

E
X

F
8

1
A

+
o

R
z

G
S

S
u

o
a

O
a

9
Q

t
M

z
y

C
S

z
a

Z
v

k
Y

n
k

P
V

u
k

b
e

M
q

0
J

4
C

K
x

Z
y

R
f

P
f

o
q

f
L

/
f

d
2

p
d

4
6

P
T

C
3

H
s

k
0

B
i

h
o

R
o

V
y

L
p

p
I

h
L

i
h

k
Z

6
Z

1
Y

k
A

j
h

I
e

q
T

t
q

I
B

8
o

l
w

0
s

n
t

I
3

i
o

l
B

7
0

Q
q

4
q

k
H

C
i

/
t

5
I

k
S

9
E

4
r

t
q

0
k

d
y

I
O

a
9

s
f

i
f

1
4

6
l

V
3

F
S

G
k

S
x

J
A

G
e

P
u

T
F

D
M

o
Q

j
o

O
A

P
c

o
J

l
i

x
R

B
G

F
O

1
a

0
Q

D
x

B
H

W
K

q
4

d
F

2
F

Y
M

9
/

e
Z

E
0

T
k

2
7

Z
J

a
u

7
X

z
1

A
k

y
R

B
f

v
g

A
B

S
A

D
c

5
A

F
V

y
B

G
q

g
D

D
B

L
w

A
J

7
A

s
3

a
n

P
W

o
v

2
u

t
0

N
K

P
N

d
n

L
g

D
7

S
3

H
2

A
0

l
O

8
=

< / l a t e x i t >

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

f(x) =|| Ax�B ||22 where A =

1 2

3 4

�
and

B =


5

�
. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2]
1

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5

Q8. a. Mean squared error:

1
M

i=1

(ŷi � yi)2

Minimisation problem: min

w1,w2,b

i=1

(ŷi � yi)2

f(w1, w2, b) =
1

i=1

(ŷi � yi)2

i=1

�
(w1x1(i) + w2x2(i) + b)� yi

�2

Update rule: zk+1 = zk � hkOf(zk) where zk =

4
w1k

w2k

Of(zk) =

66666
4

@f(zk)
@w1

@f(zk)
@w2

@f(zk)
@b

77777
5

6666666666666
4

1
M

i=1

�
(w1x1(i) + w2x2(i) + b)� yi

�
x1(i)

1
M

i=1

�
(w1x1(i) + w2x2(i) + b)� yi

�
x2(i)

1
M

i=1

�
(w1x1(i) + w2x2(i) + b)� yi

�

7777777777777
5

Update rule:

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?

2.5

1.5

1-2

-1

-1
4

3
2.5
2
x
1
1.5
1
-2
-1
0
x
2
1
2
3
5
4
3
2
1
0
-1
4
y

Q8. Consider a system taking two input variables x1 and x2 and generate an output

y. A set of M input-output data is collected in an experiment, e.g., the dataset

is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean

Squared Error using the gradient descent method, where w1 and w2 and b are the

parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:

(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

Q8. a. Mean squared error:

1
M

i=1

(ŷi � yi)2

Minimisation problem: min

w1,w2,b

i=1

(ŷi � yi)2

f(w1, w2, b) =
1

i=1

(ŷi � yi)2

i=1

�
(w1x1(i) + w2x2(i) + b)� yi

�2

Update rule: zk+1 = zk � hkOf(zk) where zk =

4
w1k

w2k

Of(xk) =

66666
4

@f(xk)
@w1

@f(xk)
@w2

@f(xk)
@b