Biologically Inspired Methods
Nature-Inspired Learning Algorithms (7CCSMBIM)
Tutorial 2: Solutions
1
Q1. What are the advantages and disadvantages of gradient descent method?
2
Q1. What are the advantages and disadvantages of gradient descent method?
3
Q1. What are the advantages and disadvantages of gradient descent method?
4
https://www.cs.toronto.edu/~frossard/post/linear_regression/
4
Q1. What are the advantages and disadvantages of gradient descent method?
5
Q1. What are the advantages and disadvantages of gradient descent method?
Advantages:
Simple
Robust and quick convergence
Tractable solution
Disadvantages:
Works with gradient
Does not guarantee global minimum
Does not work well with discrete variables
Sensitive to initial guess
Trapped in local minimum
6
Q2. Show how gradient descent method works using pseudo code
7
8
\noindent $\mathbf{x}^* = \left( \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \right)^{-1} \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} -4 \\ 4.5 \end{array} \right]$ \\~\\~\newline
%
\textbf{Verification:} $\mathbf{A}\mathbf{x}^{*}-\mathbf{B} = \mathbf{0}$?\\
$\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \mathbf{x}^* – \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$
8
9
B
G
W
M
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘\itx’);
ylabel(‘\ity’);
plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
9
10
B
G
W
M
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘\itx’);
ylabel(‘\ity’);
plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
10
11
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘\itx’);
ylabel(‘\ity’);
plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
11
12
B
G
W
M
R
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘\itx’);
ylabel(‘\ity’);
plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
\begin{align*}
&\mathbf{B}: f(0, 0) = 0\\
&\mathbf{G}: f(1, 2) = 6\\
&\mathbf{W}: f(3, 4) = 26~(\text{before replacement})\\
&\mathbf{R}: f(-2, -2) = 8\\
&\mathbf{C}: f(-0.75, -0.5) = 1.0625
\end{align*}
12
13
k = 0:
k = 1:
k = 2:
a
b
c
d
e
f
\noindent \textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\
\noindent 1$^{st}$ iteration: $\mathbf{z}_{1} = \left[ \begin{array}{c} 7 \\ 8 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 7 – 1 \\ 2 \times 8 + 1 \end{array} \right] = \left[ \begin{array}{c} 5.7 \\ 6.3 \end{array} \right]$;
$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_k + 1 \end{array} \right]$
$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_2 + 1 \end{array} \right]$
\noindent $f(x, y)$ = 72.7800.
\\~\\
\noindent 2$^{nd}$ iteration: $\mathbf{z}_{2} = \left[ \begin{array}{c} 5.7 \\ 6.3 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 5.7 – 1 \\ 2 \times 6.3 + 1 \end{array} \right] = \left[ \begin{array}{c} 4.66 \\ 4.94 \end{array} \right]$;
\noindent $f(x, y)$ = 46.3992. \\~\\
\noindent 3$^{rd}$ iteration: $\mathbf{z}_{3} = \left[ \begin{array}{c} 4.66\\ 4.94 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 4.66 – 1 \\ 2 \times 4.94 + 1 \end{array} \right] = \left[ \begin{array}{c} 3.828\\ 3.852 \end{array} \right]$;
\noindent $f(x, y)$ = 29.5155.
\frac{df(x,y)}{dx}\\
\frac{df(x,y)}{dy}
13
14
k = 0:
k = 1:
k = 2:
\noindent \textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\
\noindent 1$^{st}$ iteration: $\mathbf{z}_{1} = \left[ \begin{array}{c} 7 \\ 8 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 7 – 1 \\ 2 \times 8 + 1 \end{array} \right] = \left[ \begin{array}{c} 5.7 \\ 6.3 \end{array} \right]$;
$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_k + 1 \end{array} \right]$
$\left[ \begin{array}{c} x_{k+1} \\ y_{k+1} \end{array} \right] = \left[ \begin{array}{c} x_k \\ y_k \end{array} \right] – h_k \left[ \begin{array}{c} 2 x_k – 1 \\ 2 y_2 + 1 \end{array} \right]$
\noindent $f(x, y)$ = 72.7800.
\\~\\
\noindent 2$^{nd}$ iteration: $\mathbf{z}_{2} = \left[ \begin{array}{c} 5.7 \\ 6.3 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 5.7 – 1 \\ 2 \times 6.3 + 1 \end{array} \right] = \left[ \begin{array}{c} 4.66 \\ 4.94 \end{array} \right]$;
\noindent $f(x, y)$ = 46.3992. \\~\\
\noindent 3$^{rd}$ iteration: $\mathbf{z}_{3} = \left[ \begin{array}{c} 4.66\\ 4.94 \end{array} \right] – 0.1 \left[ \begin{array}{c} 2 \times 4.66 – 1 \\ 2 \times 4.94 + 1 \end{array} \right] = \left[ \begin{array}{c} 3.828\\ 3.852 \end{array} \right]$;
\noindent $f(x, y)$ = 29.5155.
\frac{df(x,y)}{dx}\\
\frac{df(x,y)}{dy}
14
15
Initial guess
k = 0
k = 1
k = 2
[X,Y] = meshgrid(1:0.5:10,1:0.5:10);
Z = (X-1).*X + (Y+1).*Y;
mesh(X,Y,Z);
alpha0.5;
holdon;
plot3(7, 8, 114,’ro’,’MarkerSize’, 12,’linewidth’, 3);
plot3(5.7, 6.3, 72.78,’rx’,’MarkerSize’, 12,’linewidth’, 3);
plot3(4.66, 4.94, 46.3992,’rx’,’MarkerSize’, 12,’linewidth’, 3);
plot3(3.828, 3.852, 29.5155,’rx’,’MarkerSize’, 12,’linewidth’, 3);
xlabel(‘\itx’);
ylabel(‘\ity’);
zlabel(‘{\itf}(\itx,\ity)’);
15
16
\noindent $\mathbf{x}^* = \left( \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \right)^{-1} \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} -4 \\ 4.5 \end{array} \right]$ \\~\\~\newline
%
\textbf{Verification:} $\mathbf{A}\mathbf{x}^{*}-\mathbf{B} = \mathbf{0}$?\\
$\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \mathbf{x}^* – \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$
16
17
\fbox{$h_k = \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)}$}
$f(x, y) = \frac{1}{2} \begin{bmatrix} x \\ y \end{bmatrix}^T \begin{bmatrix} 2&0 \\ 0&2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} – \begin{bmatrix} 1&-1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$
\textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\
17
18
\fbox{$h_k = \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)}$}
18
19
\textbf{Update rule:} $\mathbf{z}_{k+1} = \mathbf{z}_k – \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)} \triangledown f(\mathbf{z}_k)$
\textbf{Update rule: } $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$\\~\\
Randomly pick an initial condition as $\mathbf{z}_k = \left[ \begin{array}{c} 1 \\ -2 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_k + \frac{1}{2} \triangledown f(\mathbf{z}_k) \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_k-1 \\ 2y_k+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 1-1 \\ 2\times-2+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 1 \\ -3 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}
Run another iteration (e.g., $k+1 \rightarrow k+2$) by taking $\mathbf{z}_{k+1} = \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_{k+1} + \frac{1}{2} \triangledown f(\mathbf{z}_{k+1}) \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_{k+1}-1 \\ 2y_{k+1}+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 0.5-1 \\ 2\times-0.5+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}
19
20
\textbf{Update rule:} $\mathbf{z}_{k+1} = \mathbf{z}_k – \frac{\triangledown f(\mathbf{z}_k)^T \triangledown f(\mathbf{z}_k)}{\triangledown f(\mathbf{z}_k)^T \mathbf{Q} \triangledown f(\mathbf{z}_k)} \triangledown f(\mathbf{z}_k)$
Randomly pick an initial condition as $\mathbf{z}_k = \left[ \begin{array}{c} 1 \\ -2 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_k + \frac{1}{2} \triangledown f(\mathbf{z}_k) \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_k-1 \\ 2y_k+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 1-1 \\ 2\times-2+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 1 \\ -3 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}
Run another iteration (e.g., $k+1 \rightarrow k+2$) by taking $\mathbf{z}_{k+1} = \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_{k+1} + \frac{1}{2} \triangledown f(\mathbf{z}_{k+1}) \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_{k+1}-1 \\ 2y_{k+1}+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 0.5-1 \\ 2\times-0.5+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}
20
21
[a, b]
[c, d]
e
[f, g]
h
[i, j]
Randomly pick an initial condition as $\mathbf{z}_k = \left[ \begin{array}{c} 1 \\ -2 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_k + \frac{1}{2} \triangledown f(\mathbf{z}_k) \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_k-1 \\ 2y_k+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 1-1 \\ 2\times-2+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 1 \\ -2 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 1 \\ -3 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}
Run another iteration (e.g., $k+1 \rightarrow k+2$) by taking $\mathbf{z}_{k+1} = \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]$.
\begin{align*}
\mathbf{z}_{k+1} &= \mathbf{z}_{k+1} + \frac{1}{2} \triangledown f(\mathbf{z}_{k+1}) \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2x_{k+1}-1 \\ 2y_{k+1}+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 2\times 0.5-1 \\ 2\times-0.5+1 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right] – \frac{1}{2} \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \\
&= \left[ \begin{array}{c} 0.5 \\ -0.5 \end{array} \right]
\end{align*}
$\mathbf{x}_{k+1} = \left[ \begin{array}{c} -1 \\ -2 \end{array} \right] + \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right]\left[ \begin{array}{c} 0.5 \\ 0.5 \end{array} \right]$
$\mathbf{x}_{k+1} = \left[ \begin{array}{c} -0.5 \\ -1.5 \end{array} \right] + \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right]\left[ \begin{array}{c} 0.5 \\ 0.5 \end{array} \right]
21
22
Approximate
Decision variables
y = g(x_1, x_2)
\hat{y} = w_1 x_1 + w_2 x_2 + b
Ideal case: $y = \hat{y}$, i.e., $y – \hat{y} = 0$
22
23
\noindent $\mathbf{x}^* = \left( \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \right)^{-1} \left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right]^T \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} -4 \\ 4.5 \end{array} \right]$ \\~\\~\newline
%
\textbf{Verification:} $\mathbf{A}\mathbf{x}^{*}-\mathbf{B} = \mathbf{0}$?\\
$\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \end{array} \right] \mathbf{x}^* – \left[ \begin{array}{c} 5 \\ 6 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$
23
24
Minimise MSE subject to w1, w2 and b
Simplified notations:
\frac{(\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + \cdots + (\hat{y}_M – y_M)^2 }{M}
y(M) \leftarrow y_m, \hat{y}(M) \leftarrow \hat{y}_m
24
25
= \frac{(\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + \cdots + (\hat{y}_M – y_M)^2 }{M}
= \frac{\big( (w_1x_1(1) +w_2x_2(1) + b) – y_1 \big)^2 + \big((w_1x_1(2) +w_2x_2(2) + b) – y_2 \big)^2 + \cdots + \big((w_1x_1(M) +w_2x_2(M) + b) – y_M \big)^2 }{M}
\min_{w_1, w_2, b} \frac{1}{M} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)^2
y(M) \leftarrow y_m, \hat{y}(M) \leftarrow \hat{y}_m
Remark: The ideal set of $w_1$, $w_2$ and $b$ will lead to the cost $f(w_1, w_2, b) = 0$.
25
26
\textbf{Update rule:} $\mathbf{z}_{k+1} = \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)$ where $\mathbf{z}_k = \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right]$
\begin{align*}
\triangledown f(\mathbf{z}_k) &= \left[ \begin{array}{c} \frac{\partial f(\mathbf{z}_k)}{\partial w_1 } \\ \\ \frac{\partial f(\mathbf{z}_k)}{\partial w_2} \\ \\ \frac{\partial f(\mathbf{z}_k)}{\partial b} \end{array} \right] = \left[ \begin{array}{c} \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right]
\end{align*}
f(w_1, w_2, b) = f(\mathbf{z}) = \frac{1}{M} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)^2
26
27
\textbf{Update rule:}\\ \begin{align*} \mathbf{z}_{k+1} &= \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)\\ \Rightarrow \left[ \begin{array}{c} w_{1_{k+1}} \\ w_{2_{k+1}} \\ b_{k+1} \end{array} \right] &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – h_k \left[ \begin{array}{c} \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \\ &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – \frac{2h_k}{M} \left[ \begin{array}{c} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \end{align*}
27
28
\textbf{Update rule:}\\ \begin{align*} \mathbf{z}_{k+1} &= \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)\\ \Rightarrow \left[ \begin{array}{c} w_{1_{k+1}} \\ w_{2_{k+1}} \\ b_{k+1} \end{array} \right] &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – h_k \left[ \begin{array}{c} \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \\ &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – \frac{2h_k}{M} \left[ \begin{array}{c} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \end{align*}
28
29
\textbf{Update rule:}\\ \begin{align*} \mathbf{z}_{k+1} &= \mathbf{z}_k – h_k \triangledown f(\mathbf{z}_k)\\ \Rightarrow \left[ \begin{array}{c} w_{1_{k+1}} \\ w_{2_{k+1}} \\ b_{k+1} \end{array} \right] &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – h_k \left[ \begin{array}{c} \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \frac{1}{M} \displaystyle \sum_{i=1}^M 2\big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \\ &= \left[ \begin{array}{c} w_{1_k} \\ w_{2_k} \\ b_k \end{array} \right] – \frac{2h_k}{M} \left[ \begin{array}{c} \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_1(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big)x_2(i) \\\\ \displaystyle \sum_{i=1}^M \big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i \big) \end{array} \right] \end{align*}
29
30
Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $\Rightarrow M = 3$.
$\mathbf{z}_0 = \mathbf{z}_0 = \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] \Rightarrow
w_1 = w_2 = b = 1$
\begin{align*}
\hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 \times 1) + (1 \times -2) + 1 = 0\\
\hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 \times 2) + (1 \times 4) + 1 = 7\\
\hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 \times 3) + (1 \times 0) + 1 = 4
\end{align*}
\begin{align*}
MSE &= \frac{1}{M} \big( (\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + (\hat{y}_3 – y_3)^2 ) \big)\\
&= \frac{1}{3} \big( (0 – 3)^2 + (7 – (-1))^2 + (4 – 5)^2 ) \big)\\
&= 24.6667
\end{align*}
30
31
Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $\Rightarrow M = 3$.
$\mathbf{z}_0 = \mathbf{z}_0 = \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] \Rightarrow
w_1 = w_2 = b = 1$
\begin{align*}
\hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 \times 1) + (1 \times -2) + 1 = 0\\
\hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 \times 2) + (1 \times 4) + 1 = 7\\
\hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 \times 3) + (1 \times 0) + 1 = 4
\end{align*}
\begin{align*}
MSE &= \frac{1}{M} \big( (\hat{y}_1 – y_1)^2 + (\hat{y}_2 – y_2)^2 + (\hat{y}_3 – y_3)^2 ) \big)\\
&= \frac{1}{3} \big( (0 – 3)^2 + (7 – (-1))^2 + (4 – 5)^2 ) \big)\\
&= 24.6667
\end{align*}
31
32
33
34
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
54
55
Ingredient 1
Ingredient 2
Others
Food Product
56
57
58
Q16. Explain how “Recursive least-squares” works.
A least-squares problem is described as $\displaystyle \min_{\mathbf{x}} f(\mathbf{x}) = \mid\mid \mathbf{A}\mathbf{x} – \mathbf{B} \mid\mid_2^2$. Its solution is given as $\mathbf{x} = \big( \mathbf{A}^T\mathbf{A} \big)^{-1} \mathbf{A}^T \mathbf{B}$.\\
Denote the $i$-th row of $\mathbf{A}$ as $\mathbf{a}_i^T$ and the $i$-th row of $\mathbf{B}_i$ as $b_i$.\\ The solution $\mathbf{x}$ can be represented in row form as follows: $$\mathbf{x} = \Big( \displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T \Big)^{-1} \sum_{i=1}^m b_i \mathbf{a}_i.$$
For example, $$\mathbf{A} = \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right] = \left[ \begin{array}{c} \mathbf{a}_1^T \\ \mathbf{a}_2^T \\ \vdots \\ \mathbf{a}_m^T \\ \end{array} \right],$$ $$\mathbf{B} = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array} \right].$$
58
59
\mathbf{A}^T\mathbf{A}
\displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T
$\left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right]$
$\left[ \begin{array}{cccc} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{array} \right]$
\left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array} \right]
$\mathbf{a}_1\mathbf{a}_1^T + \mathbf{a}_2\mathbf{a}_2^T + \cdots + \mathbf{a}_m\mathbf{a}_m^T$
59
60
What happen if a new sample is coming in?
$$\mathbf{A} = \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right] = \left[ \begin{array}{c} \mathbf{a}_1^T \\ \mathbf{a}_2^T \\ \vdots \\ \mathbf{a}_m^T \\ \end{array} \right], \qquad \mathbf{B} = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array} \right].$$
$\left[ \begin{array}{cccc} a_{m+1,1} & a_{m+1,2} & \vdots & a_{m+1,n} \end{array} \right] \quad = \quad \mathbf{a}_{m+1}^T$
$b_{m+1}$
The new solution can be written as $$\mathbf{x}_{new} = \Big( \displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T \Big)^{-1} \big( \sum_{i=1}^m b_i \mathbf{a}_i + b_{m+1} \mathbf{a}_{m+1} \big).$$
60
61
At the time that you have $m$ samples, recall that the solution is: $$\mathbf{x} = \mathbf{P}(m)^{-1} \mathbf{q}(m)$$
where $$\mathbf{P}(m) = \mathbf{A}^T\mathbf{A} = \displaystyle \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T $$ and $$\mathbf{q}(m) = \mathbf{A}^T \mathbf{B} = \sum_{i=1}^m b_i \mathbf{a}_i.$$
With the new sample $\mathbf{a}_{m+1}^T$ and and $b_{m+1}$, the solution is: $$\mathbf{x}_{new} = \mathbf{P}(m+1)^{-1} \mathbf{q}(m+1)$$ where $$\mathbf{P}(m+1) = \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T = \mathbf{P}(m) + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T $$ and $$\mathbf{q}(m+1) = \sum_{i=1}^m b_i \mathbf{a}_i + b_{m+1} \mathbf{a}_{m+1}.$$
61
62
When new sample comes in, do we need to compute the inverse of P(m+1)?
Can we use the inverse of P(m) to obtain P(m+1)?
Any mathematical trick?
\noindent \textit{Rank one update formula:} $$\big( \mathbf{P} + \mathbf{a}\mathbf{a}^T \big)^{-1} = \mathbf{P}^{-1} – \frac{1}{1+\mathbf{a}^T \mathbf{P}^{-1} \mathbf{a}} \big( \mathbf{P^{-1} \mathbf{a}} \big) \big( \mathbf{P^{-1} \mathbf{a}} \big)^T$$
\noindent\textit{Remark:} Rank one update formula is valid when $\mathbf{P} = \mathbf{P}^T$, and $\mathbf{P}$ and $\mathbf{P} + \mathbf{a}\mathbf{a}^T$ are both invertible.
62
63
\noindent \textit{Rank one update formula:} $$\big( \mathbf{P} + \mathbf{a}\mathbf{a}^T \big)^{-1} = \mathbf{P}^{-1} – \frac{1}{1+\mathbf{a}^T \mathbf{P}^{-1} \mathbf{a}} \big( \mathbf{P}^{-1} \mathbf{a} \big) \big( \mathbf{P}^{-1} \mathbf{a} \big)^T$$
We have:
$$\mathbf{P}(m+1) = \sum_{i=1}^m \mathbf{a}_i \mathbf{a}_i^T + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T = \mathbf{P}(m) + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T$$
$\mathbf{P}(m+1)^{-1} \equiv \Big( \mathbf{P}(m) + \mathbf{a}_{m+1} \mathbf{a}_{m+1}^T \Big)^{-1}$.
63
64
65
66
67
68
69
70
Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2
Q1. What are the advantages and disadvantages of gradient descent method?
Q2. Show how gradient descent method works using pseudo code.
Q3. Consider a least-squares problem, min
x
f(x) =|| Ax�B ||22 where A =
1 2
3 4
�
and
B =
5
6
�
. Find the optimal solution x.
Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.
a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and
W.
b. Find the midpoint M and f(M).
c. Find the reflection R and f(R).
d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)
is performed in the pseudo code of Nelder-Mead downhill simplex method,
C =
W+M
2
is used.
Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations
with step size hk = 0.1 and initial guess (7,8).
Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.
a. Determine the optimal step size hk.
b. Show that it requires one iteration for the gradient descent method to converge
with the optimal step size hk obtained in Q6a.
Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the
worse solution) is 0.75 and the random number generator will generate a repeating
sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk
are 0.5. Fill in the content of the following table.
k x
T
k f(xk) x
T
k+1 f(xk+1) rand() xbest f(xbest)
0 [�1 � 2]
1
2
3
4
5
1
0 0.5 1 1.5 2 2.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y
00.5 11.5 22.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y
Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2 (Suggested Solutions)
Q1. Answer can be found in lecture notes.
Q2. Answer can be found in lecture notes.
Q3. x
⇤
=
1 2
3 4
�T
1 2
3 4
�!�1
1 2
3 4
�T
5
6
�
=
�4
4.5
�
Verification:
1 2
3 4
�
x
⇤ �
5
6
�
=
0
0
�
Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)
b. M =
B+G
2
=
(0,0)+(1,2)
2
= (0.5, 1); f(0.5, 1) = 1.75.
c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.
d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R. C = W+M 2 = (�2,�2)+(0.5,1) 2 = (�0.75,�0.5) and f(C) = 1.0625. As f(C) < f(W), W is replaced with C. The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2) Q5. Let z = x y � . Of(z) = 2x� 1 2y + 1 � . According to the update rule, zk+1 = zk � hkOf(zk), we have 1 st iteration: z1 = 7 8 � � 0.1 2⇥ 7� 1 2⇥ 8 + 1 � = 5.7 6.3 � ; f(x, y) = 72.7800. 2 nd iteration: z2 = 5.7 6.3 � � 0.1 2⇥ 5.7� 1 2⇥ 6.3 + 1 � = 4.66 4.94 � ; f(x, y) = 46.3992. 3 rd iteration: z3 = 4.66 4.94 � � 0.1 2⇥ 4.66� 1 2⇥ 4.94 + 1 � = 3.828 3.852 � ; f(x, y) = 29.5155. Q6. a. Update rule: Of(z) = 2x� 1 2y + 1 � where zk = xk yk � . f(z) = 1 2 z T Qz� bTz where Q = 2 0 0 2 � and b T = ⇥ �1 1 ⇤ . 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 (Suggested Solutions) Q1. Answer can be found in lecture notes. Q2. Answer can be found in lecture notes. Q3. x ⇤ = 1 2 3 4 �T 1 2 3 4 �!�1 1 2 3 4 �T 5 6 � = �4 4.5 � Verification: 1 2 3 4 � x ⇤ � 5 6 � = 0 0 � Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0; f(1, 2) = (1� 1)1 + (2 + 1)2 = 6; f(3, 4) = (3� 1)3 + (4 + 1)4 = 26. B: (0, 0); G: (1, 2); W: (3, 4) b. M = B+G 2 = (0,0)+(1,2) 2 = (0.5, 1); f(0.5, 1) = 1.75. c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8. d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R.
C =
W+M
2
=
(�2,�2)+(0.5,1)
2
= (�0.75,�0.5) and f(C) = 1.0625.
As f(C) < f(W), W is replaced with C.
The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)
Q5. Let z =
x
y
�
. Of(z) =
2x� 1
2y + 1
�
. According to the update rule, zk+1 =
zk � hkOf(zk), we have
1
st
iteration: z1 =
7
8
�
� 0.1
2⇥ 7� 1
2⇥ 8 + 1
�
=
5.7
6.3
�
; f(x, y) = 72.7800.
2
nd
iteration: z2 =
5.7
6.3
�
� 0.1
2⇥ 5.7� 1
2⇥ 6.3 + 1
�
=
4.66
4.94
�
; f(x, y) = 46.3992.
3
rd
iteration: z3 =
4.66
4.94
�
� 0.1
2⇥ 4.66� 1
2⇥ 4.94 + 1
�
=
3.828
3.852
�
; f(x, y) = 29.5155.
Q6. a. Update rule: Of(z) =
2x� 1
2y + 1
�
where zk =
xk
yk
�
.
f(z) =
1
2
z
T
Qz� bTz where Q =
2 0
0 2
�
and b
T
=
⇥
�1 1
⇤
.
1
Traditional Numerical Methods: Nelder-Mead Downhill
Simplex Method
Initialise start-
ing points
Evaluation of
starting points
Determine
B, G, and W
f (R) < f (G)?
f (B) < f (R)?
Case (i): Reflextion/Expansion
Replace
W with R
Reflection
Compute E
with f (E)
f (E) < f (B)?
Replace
W with E
Expansion
Replace
W with R
Reflection
f (R) < f (W)?
Case (ii): Contraction/Shrinking
Replace
W with R
Compute C
with f (C)
f (C) < f (W)?
Replace
W with C
Compute
S and f (S)
Replace
W with S
Replace
G with M
NoYes
Yes
No f (B) � f (R)
Yes
No f (E) � f (B)
Note: The new W is used from this point on
Yes
No f (R) � f (W)
Yes
Contraction
No f (C) � f (W)
Shrinking
Figure 6: Flowchart of Nelder-Mead Downhill Simplex Method.
Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2017-18 24 / 68
TraditionalNumericalMethods:Nelder-MeadDownhillSimplexMethodInitialisestart-
ingpoints
Evaluationof
startingpoints
Determine
B,G,andW
f(R)
As f(R) < f(W), W is replaced with R. C = W+M 2 = (�2,�2)+(0.5,1) 2 = (�0.75,�0.5) and f(C) = 1.0625. As f(C) < f(W), W is replaced with C. The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2) Q5. Let z = x y � . Of(z) = 2x� 1 2y + 1 � . According to the update rule, zk+1 = zk � hkOf(zk), we have 1 st iteration: z1 = 7 8 � � 0.1 2⇥ 7� 1 2⇥ 8 + 1 � = 5.7 6.3 � ; f(x, y) = 72.7800. 2 nd iteration: z2 = 5.7 6.3 � � 0.1 2⇥ 5.7� 1 2⇥ 6.3 + 1 � = 4.66 4.94 � ; f(x, y) = 46.3992. 3 rd iteration: z3 = 4.66 4.94 � � 0.1 2⇥ 4.66� 1 2⇥ 4.94 + 1 � = 3.828 3.852 � ; f(x, y) = 29.5155. Q6. a. Update rule: Of(z) = 2x� 1 2y + 1 � where zk = xk yk � . f(z) = 1 2 z T Qz� bTz where Q = 2 0 0 2 � and b T = ⇥ �1 1 ⇤ . 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x -2 -1 0 1 2 3 4 y -2-1.5-1-0.500.511.522.53 x -2 -1 0 1 2 3 4 y Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A = 1 2 3 4 � and B = 5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 (Suggested Solutions) Q1. Answer can be found in lecture notes. Q2. Answer can be found in lecture notes. Q3. x ⇤ = 1 2 3 4 �T 1 2 3 4 �!�1 1 2 3 4 �T 5 6 � = �4 4.5 � Verification: 1 2 3 4 � x ⇤ � 5 6 � = 0 0 � Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0; f(1, 2) = (1� 1)1 + (2 + 1)2 = 6; f(3, 4) = (3� 1)3 + (4 + 1)4 = 26. B: (0, 0); G: (1, 2); W: (3, 4) b. M = B+G 2 = (0,0)+(1,2) 2 = (0.5, 1); f(0.5, 1) = 1.75. c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8. d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R. C = W+M 2 = (�2,�2)+(0.5,1) 2 = (�0.75,�0.5) and f(C) = 1.0625. As f(C) < f(W), W is replaced with C. The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2) Q5. Let z = x y � . Of(z) = 2x� 1 2y + 1 � . According to the update rule, zk+1 = zk � hkOf(zk), we have 1 st iteration: z1 = 7 8 � � 0.1 2⇥ 7� 1 2⇥ 8 + 1 � = 5.7 6.3 � ; f(x, y) = 72.7800. 2 nd iteration: z2 = 5.7 6.3 � � 0.1 2⇥ 5.7� 1 2⇥ 6.3 + 1 � = 4.66 4.94 � ; f(x, y) = 46.3992. 3 rd iteration: z3 = 4.66 4.94 � � 0.1 2⇥ 4.66� 1 2⇥ 4.94 + 1 � = 3.828 3.852 � ; f(x, y) = 29.5155. Q6. a. Update rule: Of(z) = 2x� 1 2y + 1 � where zk = xk yk � . f(z) = 1 2 z T Qz� bTz where Q = 2 0 0 2 � and b T = ⇥ �1 1 ⇤ . 1 df(x, y) dx df(x, y) dy 0 10 50 8 10 100 f( x, y) 6 8 150 y 6 x 4 200 4 2 2 0 0 0 10 50 8 10 100 f ( x , y ) 6 8 150 y 6 x 4 200 4 2 2 0 0 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A = 1 2 3 4 � and B = 5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A = 1 2 3 4 � and B = 5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A = 1 2 3 4 � and B = 5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 (Suggested Solutions) Q1. Answer can be found in lecture notes. Q2. Answer can be found in lecture notes. Q3. x ⇤ = 1 2 3 4 �T 1 2 3 4 �!�1 1 2 3 4 �T 5 6 � = �4 4.5 � Verification: 1 2 3 4 � x ⇤ � 5 6 � = 0 0 � Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0; f(1, 2) = (1� 1)1 + (2 + 1)2 = 6; f(3, 4) = (3� 1)3 + (4 + 1)4 = 26. B: (0, 0); G: (1, 2); W: (3, 4) b. M = B+G 2 = (0,0)+(1,2) 2 = (0.5, 1); f(0.5, 1) = 1.75. c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8. d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R. C = W+M 2 = (�2,�2)+(0.5,1) 2 = (�0.75,�0.5) and f(C) = 1.0625. As f(C) < f(W), W is replaced with C. The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2) Q5. Let z = x y � . Of(z) = 2x� 1 2y + 1 � . According to the update rule, zk+1 = zk � hkOf(zk), we have 1 st iteration: z1 = 7 8 � � 0.1 2⇥ 7� 1 2⇥ 8 + 1 � = 5.7 6.3 � ; f(x, y) = 72.7800. 2 nd iteration: z2 = 5.7 6.3 � � 0.1 2⇥ 5.7� 1 2⇥ 6.3 + 1 � = 4.66 4.94 � ; f(x, y) = 46.3992. 3 rd iteration: z3 = 4.66 4.94 � � 0.1 2⇥ 4.66� 1 2⇥ 4.94 + 1 � = 3.828 3.852 � ; f(x, y) = 29.5155. Q6. Update rule: Of(z) = 2x� 1 2y + 1 � where zk = xk yk � . f(z) = 1 2 z T Qz� bTz where Q = 2 0 0 2 � and b T = ⇥ �1 1 ⇤ . 1 f(x, y) = 1 2 x y �T 2 0 0 2 � x y � � ⇥ 1 �1 ⇤ x y � < l a t e x i t s h a 1 _ b a s e 6 4 = " O C z H I t B f b x 0 k Y Q O e r e E J C F Z 7 M f w = " >
A
A
A
C
w
H
i
c
n
V
F
d
S
+
N
A
F
J
1
E
X
b
W
6
a
9
V
H
X
y
5
b
t
7
i
w
l
i
Q
I
+
i
K
I
v
v
i
o
Y
F
V
o
u
m
U
y
v
W
k
H
J
5
M
4
M
5
H
G
k
D
/
p
i
+
y
/
2
U
k
t
s
t
p
9
E
A
8
M
H
M
4
9
9
2
P
u
j
T
L
B
t
f
G
8
P
4
6
7
s
L
j
0
Z
X
l
l
t
b
G
2
/
v
X
b
R
n
N
z
6
1
q
n
u
W
L
Y
Z
a
l
I
1
W
1
E
N
Q
o
u
s
W
u
4
E
X
i
b
K
a
R
J
J
P
A
m
u
j
u
r
4
z
c
P
q
D
R
P
5
Z
U
p
M
u
w
n
d
C
R
5
z
B
k
1
V
h
o
0
n
3
f
j
v
c
k
v
K
H
7
C
M
Y
S
x
o
q
z
0
q
z
K
o
I
I
x
w
x
G
U
Z
J
d
Q
o
P
q
l
g
A
m
E
I
B
Y
Q
o
h
6
/
i
7
6
s
5
W
9
D
2
a
q
P
X
D
t
5
a
P
1
Q
P
9
u
d
s
f
n
v
f
/
0
S
l
3
c
a
g
2
f
I
6
3
h
Q
w
T
/
w
Z
a
Z
E
Z
L
g
b
N
p
3
C
Y
s
j
x
B
a
Z
i
g
W
v
e
O
M
t
M
v
q
T
K
c
C
a
w
a
Y
a
4
x
o
+
y
O
j
r
B
n
q
a
Q
J
6
n
4
5
3
X
8
F
P
6
w
y
h
D
h
V
9
k
k
D
U
/
X
f
j
J
I
m
W
h
d
J
Z
J
1
2
w
L
F
+
H
6
v
F
/
8
V
6
u
Y
m
P
+
i
W
X
W
W
5
Q
s
p
d
G
c
S
7
A
p
F
A
f
E
4
Z
c
I
T
O
i
s
I
Q
y
x
e
2
s
w
M
b
U
X
t
H
Y
k
z
f
q
J
f
j
v
v
z
x
P
r
o
O
O
f
9
A
5
u
A
x
a
J
6
e
z
d
a
y
Q
H
f
K
d
7
B
G
f
H
J
I
T
c
k
4
u
S
J
c
w
5
9
h
h
j
n
A
S
9
9
Q
d
u
6
l
7
/
2
J
1
n
V
n
O
N
n
k
D
9
/
E
v
j
G
T
Z
f
w
=
=
< / l a t e x i t >
Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2 (Suggested Solutions)
Q1. Answer can be found in lecture notes.
Q2. Answer can be found in lecture notes.
Q3. x⇤ =
1 2
3 4
�T
1 2
3 4
�!�1
1 2
3 4
�T
5
6
�
=
�4
4.5
�
Verification:
1 2
3 4
�
x⇤ �
5
6
�
=
0
0
�
Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)
b. M = B+G
2
= (0,0)+(1,2)
2
= (0.5, 1); f(0.5, 1) = 1.75.
c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.
d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R. C = W+M 2 = (�2,�2)+(0.5,1) 2 = (�0.75,�0.5) and f(C) = 1.0625. As f(C) < f(W), W is replaced with C. The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2) Q5. Let z = x y � . Of(z) = 2x� 1 2y + 1 � . According to the update rule, zk+1 = zk � hkOf(zk), we have 1st iteration: z1 = 7 8 � � 0.1 2⇥ 7� 1 2⇥ 8 + 1 � = 5.7 6.3 � ; f(x, y) = 72.7800. 2nd iteration: z2 = 5.7 6.3 � � 0.1 2⇥ 5.7� 1 2⇥ 6.3 + 1 � = 4.66 4.94 � ; f(x, y) = 46.3992. 3rd iteration: z3 = 4.66 4.94 � � 0.1 2⇥ 4.66� 1 2⇥ 4.94 + 1 � = 3.828 3.852 � ; f(x, y) = 29.5155. Q6. a. Update rule: Of(z) = 2x� 1 2y + 1 � where zk = xk yk � . f(z) = 1 2 zTQz� bTz where Q = 2 0 0 2 � and bT = ⇥ 1 �1 ⇤ . 1 hk = Of(zk)TOf(zk) Of(zk)TQOf(zk) where Of(zk) = 2xk � 1 2yk + 1 � . hk = Of(zk)TOf(zk) Of(zk)TQOf(zk) = ⇥ 2xk � 1 2yk + 1 ⇤ 2xk � 1 2yk + 1 � ⇥ 2xk � 1 2yk + 1 ⇤ 2 0 0 2 � 2xk � 1 2yk + 1 � = 4x 2 k � 4xk + 2 + 4y 2 k + 4yk 8x 2 k � 8xk + 4 + 8y 2 k + 8yk = 1 2 . Q7. Update rule: xk+1 = xk + Dkhk where xk = xk yk � , Dk = 1 0 0 1 � and hk = 0.5 0.5 � . k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5 1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0 2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5 3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5 4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5 5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5 2 Department of Informatics, King’s College London Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) Tutorial 2 Q1. What are the advantages and disadvantages of gradient descent method? Q2. Show how gradient descent method works using pseudo code. Q3. Consider a least-squares problem, min x f(x) =|| Ax�B ||22 where A = 1 2 3 4 � and B = 5 6 � . Find the optimal solution x. Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using Nelder-Mead Downhill Simplex Method. a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and W. b. Find the midpoint M and f(M). c. Find the reflection R and f(R). d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii) is performed in the pseudo code of Nelder-Mead downhill simplex method, C = W+M 2 is used. Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations with step size hk = 0.1 and initial guess (7,8). Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the gradient descent method. a. Determine the optimal step size hk. b. Show that it requires one iteration for the gradient descent method to converge with the optimal step size hk obtained in Q6a. Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y using the random walk optimisation algorithm. The Threshold (for accepting the worse solution) is 0.75 and the random number generator will generate a repeating sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first number generated). The diagonal entries of Dk are all 1 and all the entries of hk are 0.5. Fill in the content of the following table. k x T k f(xk) x T k+1 f(xk+1) rand() xbest f(xbest) 0 [�1 � 2] 1 2 3 4 5 1 Traditional Numerical Methods: Random Walk Algorithm: Random Walk Optimisation input: f (x) : ¬n!¬; x0: an initial solution output: x ⇤, a local minimum of the cost function f (x) k 0; xbest = x0; Threshold ; while STOP-CRIT and k < kmax do Generate Dk and hk; xk+1 xk +Dkhk; If f (xk+1) > f (xk)
If rand < Threshold Then xk+1 xk;
Else
If f (xbest) > f (xk+1) Then xbest xk+1;
End
k k+1;
end
return xk and xbest;
Table 3: Pseudo Code of Random Walk Optimisation.
Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2018-19 46 / 68
TraditionalNumericalMethods:RandomWalkAlgorithm:RandomWalkOptimisation
input:f(x):¬
n
!¬;x
0
:aninitialsolution
output:x
⇤
,alocalminimumofthecostfunctionf(x)
k 0;x
best
=x
0
;Threshold;
whileSTOP-CRITandk
k
)
Ifrand
k+1
)Thenx
best
x
k+1
;
End
k k+1;
end
returnx
k
andx
best
;
Table3:PseudoCodeofRandomWalkOptimisation.
DrH.K.Lam(KCL) Optimisation BiologicallyInspiredMethods2018-1946/68
Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2
Q1. What are the advantages and disadvantages of gradient descent method?
Q2. Show how gradient descent method works using pseudo code.
Q3. Consider a least-squares problem, min
x
f(x) =|| Ax�B ||22 where A =
1 2
3 4
�
and
B =
5
6
�
. Find the optimal solution x.
Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.
a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and
W.
b. Find the midpoint M and f(M).
c. Find the reflection R and f(R).
d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)
is performed in the pseudo code of Nelder-Mead downhill simplex method,
C =
W+M
2
is used.
Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations
with step size hk = 0.1 and initial guess (7,8).
Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.
a. Determine the optimal step size hk.
b. Show that it requires one iteration for the gradient descent method to converge
with the optimal step size hk obtained in Q6a.
Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the
worse solution) is 0.75 and the random number generator will generate a repeating
sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk
are 0.5. Fill in the content of the following table.
k x
T
k f(xk) x
T
k+1 f(xk+1) rand() xbest f(xbest)
0 [�1 � 2]
1
2
3
4
5
1
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)
where Of(zk) =
2xk � 1
2yk + 1
�
.
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)
=
⇥
2xk � 1 2yk + 1
⇤
2xk � 1
2yk + 1
�
⇥
2xk � 1 2yk + 1
⇤
2 0
0 2
�
2xk � 1
2yk + 1
�
=
4x
2
k � 4xk + 2 + 4y
2
k + 4yk
8x
2
k � 8xk + 4 + 8y
2
k + 8yk
=
1
2
.
b. Update rule: zk+1 = zk �
Of(zk)TOf(zk)
Of(zk)TQOf(zk)
Of(zk)
Randomly pick an initial condition as zk =
1
�2
�
.
zk+1 = zk +
1
2
Of(zk)
=
1
�2
�
�
1
2
2xk � 1
2yk + 1
�
=
1
�2
�
�
1
2
2⇥ 1� 1
2⇥�2 + 1
�
=
1
�2
�
�
1
2
1
�3
�
=
0.5
�0.5
�
Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =
0.5
�0.5
�
.
zk+1 = zk+1 +
1
2
Of(zk+1)
=
0.5
�0.5
�
�
1
2
2xk+1 � 1
2yk+1 + 1
�
=
0.5
�0.5
�
�
1
2
2⇥ 0.5� 1
2⇥�0.5 + 1
�
=
0.5
�0.5
�
�
1
2
0
0
�
=
0.5
�0.5
�
Q7. Update rule: xk+1 = xk + Dkhk where xk =
xk
yk
�
, Dk =
1 0
0 1
�
and hk =
0.5
0.5
�
.
2
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)
where Of(zk) =
2xk � 1
2yk + 1
�
.
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)
=
⇥
2xk � 1 2yk + 1
⇤
2xk � 1
2yk + 1
�
⇥
2xk � 1 2yk + 1
⇤
2 0
0 2
�
2xk � 1
2yk + 1
�
=
4x
2
k � 4xk + 2 + 4y
2
k + 4yk
8x
2
k � 8xk + 4 + 8y
2
k + 8yk
=
1
2
.
b. Update rule: zk+1 = zk �
Of(zk)TOf(zk)
Of(zk)TQOf(zk)
Of(zk)
Randomly pick an initial condition as zk =
1
�2
�
.
zk+1 = zk +
1
2
Of(zk)
=
1
�2
�
�
1
2
2xk � 1
2yk + 1
�
=
1
�2
�
�
1
2
2⇥ 1� 1
2⇥�2 + 1
�
=
1
�2
�
�
1
2
1
�3
�
=
0.5
�0.5
�
Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =
0.5
�0.5
�
.
zk+1 = zk+1 +
1
2
Of(zk+1)
=
0.5
�0.5
�
�
1
2
2xk+1 � 1
2yk+1 + 1
�
=
0.5
�0.5
�
�
1
2
2⇥ 0.5� 1
2⇥�0.5 + 1
�
=
0.5
�0.5
�
�
1
2
0
0
�
=
0.5
�0.5
�
Q7. Update rule: xk+1 = xk + Dkhk where xk =
xk
yk
�
, Dk =
1 0
0 1
�
and hk =
0.5
0.5
�
.
2
f(�0.5,�1.5) = 1.5
< l a t e x i t s h a 1 _ b a s e 6 4 = " 5 B n V 9 j H Z b 8 M Q G I h U r q Y 2 q C s O W U 8 = " >
A
A
A
B
/
H
i
c
b
V
D
L
S
g
M
x
F
M
3
U
V
x
1
f
o
1
2
6
C
R
a
l
g
h
1
m
p
M
V
u
h
K
I
b
l
x
X
s
A
9
q
h
Z
N
J
M
G
5
p
5
k
G
S
E
Y
a
i
/
4
C
e
4
c
a
G
I
W
z
/
A
T
3
A
n
+
B
v
u
T
R
8
L
b
T
1
w
u
Y
d
z
7
i
U
3
x
4
0
Y
F
d
K
y
P
r
X
M
0
v
L
K
6
l
p
2
X
d
/
Y
3
N
r
e
M
X
b
3
G
i
K
M
O
S
Z
1
H
L
K
Q
t
1
w
k
C
K
M
B
q
U
s
q
G
W
l
F
n
C
D
f
Z
a
T
p
D
i
/
H
f
v
O
W
c
E
H
D
4
E
Y
m
E
X
F
8
1
A
+
o
R
z
G
S
S
u
o
a
O
a
9
Q
t
M
z
y
C
S
z
a
Z
v
k
Y
n
k
P
V
u
k
b
e
M
q
0
J
4
C
K
x
Z
y
R
f
P
f
o
q
f
L
/
f
d
2
p
d
4
6
P
T
C
3
H
s
k
0
B
i
h
o
R
o
V
y
L
p
p
I
h
L
i
h
k
Z
6
Z
1
Y
k
A
j
h
I
e
q
T
t
q
I
B
8
o
l
w
0
s
n
t
I
3
i
o
l
B
7
0
Q
q
4
q
k
H
C
i
/
t
5
I
k
S
9
E
4
r
t
q
0
k
d
y
I
O
a
9
s
f
i
f
1
4
6
l
V
3
F
S
G
k
S
x
J
A
G
e
P
u
T
F
D
M
o
Q
j
o
O
A
P
c
o
J
l
i
x
R
B
G
F
O
1
a
0
Q
D
x
B
H
W
K
q
4
d
F
2
F
Y
M
9
/
e
Z
E
0
T
k
2
7
Z
J
a
u
7
X
z
1
A
k
y
R
B
f
v
g
A
B
S
A
D
c
5
A
F
V
y
B
G
q
g
D
D
B
L
w
A
J
7
A
s
3
a
n
P
W
o
v
2
u
t
0
N
K
P
N
d
n
L
g
D
7
S
3
H
2
A
0
l
O
8
=
< / l a t e x i t >
Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2
Q1. What are the advantages and disadvantages of gradient descent method?
Q2. Show how gradient descent method works using pseudo code.
Q3. Consider a least-squares problem, min
x
f(x) =|| Ax�B ||22 where A =
1 2
3 4
�
and
B =
5
6
�
. Find the optimal solution x.
Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.
a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and
W.
b. Find the midpoint M and f(M).
c. Find the reflection R and f(R).
d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)
is performed in the pseudo code of Nelder-Mead downhill simplex method,
C =
W+M
2
is used.
Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations
with step size hk = 0.1 and initial guess (7,8).
Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.
a. Determine the optimal step size hk.
b. Show that it requires one iteration for the gradient descent method to converge
with the optimal step size hk obtained in Q6a.
Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the
worse solution) is 0.75 and the random number generator will generate a repeating
sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk
are 0.5. Fill in the content of the following table.
k x
T
k f(xk) x
T
k+1 f(xk+1) rand() xbest f(xbest)
0 [�1 � 2]
1
2
3
4
5
1
k x
T
k f(xk) x
T
k+1 f(xk+1) rand() xbest f(xbest)
0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5
Q8. a. Mean squared error:
1
M
MX
i=1
(ŷi � yi)2
Minimisation problem: min
w1,w2,b
1
M
MX
i=1
(ŷi � yi)2
b.
f(w1, w2, b) =
1
M
MX
i=1
(ŷi � yi)2
=
1
M
MX
i=1
�
(w1x1(i) + w2x2(i) + b)� yi
�2
Update rule: zk+1 = zk � hkOf(zk) where zk =
2
4
w1k
w2k
bk
3
5
Of(zk) =
2
66666
4
@f(zk)
@w1
@f(zk)
@w2
@f(zk)
@b
3
77777
5
=
2
6666666666666
4
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x1(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x2(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
3
7777777777777
5
Update rule:
3
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?
2
3
2.5
2
x1
1.5
1-2
-1
0
x2
1
2
3
5
4
3
2
1
0
-1
4
y
3
2.5
2
x
1
1.5
1
-2
-1
0
x
2
1
2
3
5
4
3
2
1
0
-1
4
y
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
2
k x
T
k f(xk) x
T
k+1 f(xk+1) rand() xbest f(xbest)
0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5
Q8. a. Mean squared error:
1
M
MX
i=1
(ŷi � yi)2
Minimisation problem: min
w1,w2,b
1
M
MX
i=1
(ŷi � yi)2
b.
f(w1, w2, b) =
1
M
MX
i=1
(ŷi � yi)2
=
1
M
MX
i=1
�
(w1x1(i) + w2x2(i) + b)� yi
�2
Update rule: zk+1 = zk � hkOf(zk) where zk =
2
4
w1k
w2k
bk
3
5
Of(xk) =
2
66666
4
@f(xk)
@w1
@f(xk)
@w2
@f(xk)
@b
3
77777
5
=
2
6666666666666
4
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x1(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x2(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
3
7777777777777
5
Update rule:
3
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?
2
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
2
k x
T
k f(xk) x
T
k+1 f(xk+1) rand() xbest f(xbest)
0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5
Q8. a. Mean squared error:
1
M
MX
i=1
(ŷi � yi)2
Minimisation problem: min
w1,w2,b
1
M
MX
i=1
(ŷi � yi)2
b.
f(w1, w2, b) =
1
M
MX
i=1
(ŷi � yi)2
=
1
M
MX
i=1
�
(w1x1(i) + w2x2(i) + b)� yi
�2
Update rule: zk+1 = zk � hkOf(zk) where zk =
2
4
w1k
w2k
bk
3
5
Of(xk) =
2
66666
4
@f(xk)
@w1
@f(xk)
@w2
@f(xk)
@b
3
77777
5
=
2
6666666666666
4
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x1(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x2(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
3
7777777777777
5
Update rule:
3
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
2
zk+1 = zk � hkOf(zk)
)
2
4
w1k+1
w2k+1
bk+1
3
5
=
2
4
w1k
w2k
bk
3
5� hk
2
6666666666666
4
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x1(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
x2(i)
1
M
MX
i=1
2
�
(w1x1(i) + w2x2(i) + b)� yi
�
3
7777777777777
5
=
2
4
w1k
w2k
bk
3
5�
2hk
M
2
6666666666666
4
MX
i=1
�
(w1x1(i) + w2x2(i) + b)� yi
�
x1(i)
MX
i=1
�
(w1x1(i) + w2x2(i) + b)� yi
�
x2(i)
MX
i=1
�
(w1x1(i) + w2x2(i) + b)� yi
�
3
7777777777777
5
c. Step size: hk = 0.1
Dateset: (1,�2, 3), (2, 4,�1), (3, 0, 5) ) M = 3.
Initial guess: z0 =
2
4
w10
w20
b0
3
5
=
2
4
1
1
1
3
5
4
First iteration: k = with z0 =
2
4
1
1
1
3
5
2
4
w11
w21
b1
3
5
=
2
4
1
1
1
3
5�
2⇥ 0.1
3
2
6666666666666666
4
�
(w1x1(1) + w2x2(1) + b)� y1
�
x1(1)
+
�
(w1x1(2) + w2x2(2) + b)� y2
�
x1(2)
+
�
(w1x1(3) + w2x2(3) + b)� y3
�
x1(3)
�
(w1x1(1) + w2x2(1) + b)� y1
�
x2(1)
+
�
(w1x1(2) + w2x2(2) + b)� y2
�
x2(2)
+
�
(w1x1(3) + w2x2(3) + b)� y3
�
x2(3)
�
(w1x1(1) + w2x2(1) + b)� y1
�
+
�
(w1x1(2) + w2x2(2) + b)� y2
�
+
�
(w1x1(3) + w2x2(3) + b)� y3
�
3
7777777777777777
5
=
2
4
1
1
1
3
5�
2⇥ 0.1
3
2
6666666666666666
4
�
(1⇥ w1 � 2⇥ w2 + b)� 3
�
⇥ 1
+
�
(2⇥ w1 + 4⇥ w2 + b) + 1
�
⇥ 2
+
�
(3⇥ w1 + 0⇥ w2 + b)� 5
�
⇥ 3
�
(1⇥ w1 � 2⇥ w2 + b)� 3
�
⇥�2
+
�
(2⇥ w1 + 4⇥ w2 + b) + 1
�
⇥ 4
+
�
(3⇥ w1 + 0⇥ w2 + b)� 5
�
⇥ 0
�
(1⇥ w1 � 2⇥ w2 + b)� 3
�
+
�
(2⇥ w1 + 4⇥ w2 + b) + 1
�
+
�
(3⇥ w1 + 0⇥ w2 + b)� 5
�
3
7777777777777777
5
=
2
4
1
1
1
3
5�
2
3
⇥ 0.1⇥
2
4
10
38
4
3
5
=
2
4
0.3333
�1.5333
0.7333
3
5
Second iteration: k = 1 with z1 =
2
4
0.3333
�1.5333
0.7333
3
5
z2 =
2
4
1.4089
�0.3867
1.1244
3
5
Third iteration: k = 2 with z2 =
2
4
1.4089
�0.3867
1.1244
3
5
z3 =
2
4
0.8655
�1.2513
0.8542
3
5
Fourth iteration: k = 3 with z3 =
2
4
0.8655
�1.2513
0.8542
3
5
5
First iteration: k = with z0 =
2
4
1
1
1
3
5
2
4
w11
w21
b1
3
5
=
2
4
1
1
1
3
5�
2⇥ 0.1
3
2
6666666666666666
4
�
(w1x1(1) + w2x2(1) + b)� y1
�
x1(1)
+
�
(w1x1(2) + w2x2(2) + b)� y2
�
x1(2)
+
�
(w1x1(3) + w2x2(3) + b)� y3
�
x1(3)
�
(w1x1(1) + w2x2(1) + b)� y1
�
x2(1)
+
�
(w1x1(2) + w2x2(2) + b)� y2
�
x2(2)
+
�
(w1x1(3) + w2x2(3) + b)� y3
�
x2(3)
�
(w1x1(1) + w2x2(1) + b)� y1
�
+
�
(w1x1(2) + w2x2(2) + b)� y2
�
+
�
(w1x1(3) + w2x2(3) + b)� y3
�
3
7777777777777777
5
=
2
4
1
1
1
3
5�
2⇥ 0.1
3
2
6666666666666666
4
�
(1⇥ w1 � 2⇥ w2 + b)� 3
�
⇥ 1
+
�
(2⇥ w1 + 4⇥ w2 + b) + 1
�
⇥ 2
+
�
(3⇥ w1 + 0⇥ w2 + b)� 5
�
⇥ 3
�
(1⇥ w1 � 2⇥ w2 + b)� 3
�
⇥�2
+
�
(2⇥ w1 + 4⇥ w2 + b) + 1
�
⇥ 4
+
�
(3⇥ w1 + 0⇥ w2 + b)� 5
�
⇥ 0
�
(1⇥ w1 � 2⇥ w2 + b)� 3
�
+
�
(2⇥ w1 + 4⇥ w2 + b) + 1
�
+
�
(3⇥ w1 + 0⇥ w2 + b)� 5
�
3
7777777777777777
5
=
2
4
1
1
1
3
5�
2
3
⇥ 0.1⇥
2
4
10
38
4
3
5
=
2
4
0.3333
�1.5333
0.7333
3
5
Second iteration: k = 1 with z1 =
2
4
0.3333
�1.5333
0.7333
3
5
z2 =
2
4
1.4089
�0.3867
1.1244
3
5
Third iteration: k = 2 with z2 =
2
4
1.4089
�0.3867
1.1244
3
5
z3 =
2
4
0.8655
�1.2513
0.8542
3
5
Fourth iteration: k = 3 with z3 =
2
4
0.8655
�1.2513
0.8542
3
5
5
z4 =
2
4
1.2832
�0.7097
0.9707
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
�0.7097
0.9707
3
5
z5 =
2
4
1.0478
�1.0728
0.8246
3
5
When increasing k, z converges at
2
4
2.0000
�1.0000
�1.0000
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
6
z4 =
2
4
1.2832
�0.7097
0.9707
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
�0.7097
0.9707
3
5
z5 =
2
4
1.0478
�1.0728
0.8246
3
5
When increasing k, z converges at
2
4
2.0000
�1.0000
�1.0000
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
6
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?
2
z4 =
2
4
1.2832
�0.7097
0.9707
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
�0.7097
0.9707
3
5
z5 =
2
4
1.0478
�1.0728
0.8246
3
5
When increasing k, z converges at
2
4
2.0000
�1.0000
�1.0000
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
6
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?
Q9. Find the least-squares solution of Ax = B where:
A =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5 and B =
2
66
4
0
1
3
4
3
77
5 .
Q10. Consider the three data points (x, y) given as follows:
(0, 6), (1, 0), (2, 0),
which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized.
(1,0)
(0,6)
(2,0)
2
z4 =
2
4
1.2832
�0.7097
0.9707
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
�0.7097
0.9707
3
5
z5 =
2
4
1.0478
�1.0728
0.8246
3
5
When increasing k, z converges at
2
4
2.0000
�1.0000
�1.0000
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
Q9. Represent a1, a2, a3 as the columns of A. It is found that it is an orthogonal set,
i.e., ai · aj = 0 for all i 6= j.
The solution can thus be calculated as
x =
2
6666666
4
B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3
3
7777777
5
=
2
666666
4
�
3
2
�
3
2
2
3
777777
5
.
Remark: The solution is the same as x = (ATA)�1ATB.
Q10. Given by the question, the three data points, in the form (x, y), are from a single
line. However, due to measurement errors, these three points, (0, 6), (1, 0) and (2,
0), obtained from measurement may not lie on a line. When you plot the data, it
can be seen that they are not lying in a straight line.
The general form of a straight line is
y = mx+ c.
where m and c are constants to be determined.
We are going to approximate the three data points using a straight line so it is
expected that the values of y from this straight line produces the following values:
6 = m · 0 + c from the data point (0, 6)
0 = m · 1 + c from the data point (1, 0)
0 = m · 2 + c from the data point (2, 0)
6
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?
Q9. Find the least-squares solution of Ax = B where:
A =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5 and B =
2
66
4
0
1
3
4
3
77
5 .
Q10. Consider the three data points (x, y) given as follows:
(0, 6), (1, 0), (2, 0),
which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized.
(1,0)
(0,6)
(2,0)
2
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.
d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?
Q9. Find the least-squares solution of Ax = B where:
A =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5 and B =
2
66
4
0
1
3
4
3
77
5 .
Q10. Consider the three data points (x, y) given as follows:
(0, 6), (1, 0), (2, 0),
which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized.
(1,0)
(0,6)
(2,0)
2
z4 =
2
4
1.2832
�0.7097
0.9707
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
�0.7097
0.9707
3
5
z5 =
2
4
1.0478
�1.0728
0.8246
3
5
When increasing k, z converges at
2
4
2.0000
�1.0000
�1.0000
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
Q9. Represent a1, a2, a3 as the columns of A. It is found that it is an orthogonal set,
i.e., ai · aj = 0 for all i 6= j.
The solution can thus be calculated as
x =
2
6666666
4
B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3
3
7777777
5
=
2
666666
4
�
3
2
�
3
2
2
3
777777
5
.
Remark: The solution is the same as x = (ATA)�1ATB.
Q10. Given by the question, the three data points, in the form (x, y), are from a single
line. However, due to measurement errors, these three points, (0, 6), (1, 0) and (2,
0), obtained from measurement may not lie on a line. When you plot the data, it
can be seen that they are not lying in a straight line.
The general form of a straight line is
y = mx+ c.
where m and c are constants to be determined.
We are going to approximate the three data points using a straight line so it is
expected that the values of y from this straight line produces the following values:
6 = m · 0 + c from the data point (0, 6)
0 = m · 1 + c from the data point (1, 0)
0 = m · 2 + c from the data point (2, 0)
6
In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have
Ax = B
where
A =
2
4
0 1
1 1
2 1
3
5 , x =
m
c
�
, B =
2
4
6
0
0
3
5 .
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
The least-squares solution is found as
x =
m
c
�
= (ATA)�1ATB =
�3
5
�
.
Therefore the best-fit line is
y = mx+ c = �3x+ 5.
(1,0)
(0,6)
(2,0)
y = �3x+ 5
What exactly is the line y = f(x) = �3x+ 5 minimizing?
The least-squares solution x minimizes kAx�Bk22, which is the sum of the squares
of the entries of the vector Ax�B. More specifically,
Ax =
2
4
0 1
1 1
2 1
3
5
�3
5
�
=
2
4
0⇥�3 + 5
1⇥�3 + 5
2⇥�3 + 5
3
5 =
2
4
f(0)
f(1)
f(2)
3
5 =
2
4
5
2
�1
3
5 .
7
In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have
Ax = B
where
A =
2
4
0 1
1 1
2 1
3
5 , x =
m
c
�
, B =
2
4
6
0
0
3
5 .
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
The least-squares solution is found as
x =
m
c
�
= (ATA)�1ATB =
�3
5
�
.
Therefore the best-fit line is
y = mx+ c = �3x+ 5.
(1,0)
(0,6)
(2,0)
y = �3x+ 5
What exactly is the line y = f(x) = �3x+ 5 minimizing?
The least-squares solution x minimizes kAx�Bk22, which is the sum of the squares
of the entries of the vector Ax�B. More specifically,
Ax =
2
4
0 1
1 1
2 1
3
5
�3
5
�
=
2
4
0⇥�3 + 5
1⇥�3 + 5
2⇥�3 + 5
3
5 =
2
4
f(0)
f(1)
f(2)
3
5 =
2
4
5
2
�1
3
5 .
7
In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have
Ax = B
where
A =
2
4
0 1
1 1
2 1
3
5 , x =
m
c
�
, B =
2
4
6
0
0
3
5 .
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
The least-squares solution is found as
x =
m
c
�
= (ATA)�1ATB =
�3
5
�
.
Therefore the best-fit line is
y = mx+ c = �3x+ 5.
(1,0)
(0,6)
(2,0)
y = �3x+ 5
What exactly is the line y = f(x) = �3x+ 5 minimizing?
The least-squares solution x minimizes kAx�Bk22, which is the sum of the squares
of the entries of the vector Ax�B. More specifically,
Ax =
2
4
0 1
1 1
2 1
3
5
�3
5
�
=
2
4
0⇥�3 + 5
1⇥�3 + 5
2⇥�3 + 5
3
5 =
2
4
f(0)
f(1)
f(2)
3
5 =
2
4
5
2
�1
3
5 .
7
The entries of Ax are the y-coordinates of the graph of the line y = �3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax̂�B is the vertical distance of the graph from the data
points. y = f(x) = �3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared di↵erences
between the approximate y value given by the straight line and the desired value of
y given in B.
(1,0)
(0,6)
(2,0)
y = �3x+ 5
�1
�1
2
Q11. The general equation for a parabola is
y = Bx2 + Cx+D (1)
where B, C and D are constants to be determined and the data point is in the form
of (x, y).
Since the four data points, (�1, 1
2
), (1,�1), (2,�1
2
), (3, 2), are approximated by this
parabola equation, it is expected that the value of y from the parabola equation
should produce:
Data point (�1,
1
2
) :
1
2
= B ⇥ (�1)2 + C ⇥ (�1) +D
Data point (1,�1) : �1 = B ⇥ (1)2 + C ⇥ (1) +D
Data point (2,�
1
2
) : �
1
2
= B ⇥ (2)2 + C ⇥ (2) +D
Data point (3, 2) : 2 = B ⇥ (3)2 + C ⇥ (3) +D.
8
The entries of Ax are the y-coordinates of the graph of the line y = �3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax̂�B is the vertical distance of the graph from the data
points. y = f(x) = �3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared di↵erences
between the approximate y value given by the straight line and the desired value of
y given in B.
(1,0)
(0,6)
(2,0)
y = �3x+ 5
�1
�1
2
Q11. The general equation for a parabola is
y = Bx2 + Cx+D (1)
where B, C and D are constants to be determined and the data point is in the form
of (x, y).
Since the four data points, (�1, 1
2
), (1,�1), (2,�1
2
), (3, 2), are approximated by this
parabola equation, it is expected that the value of y from the parabola equation
should produce:
Data point (�1,
1
2
) :
1
2
= B ⇥ (�1)2 + C ⇥ (�1) +D
Data point (1,�1) : �1 = B ⇥ (1)2 + C ⇥ (1) +D
Data point (2,�
1
2
) : �
1
2
= B ⇥ (2)2 + C ⇥ (2) +D
Data point (3, 2) : 2 = B ⇥ (3)2 + C ⇥ (3) +D.
8
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,1
2
)
(1,-1)
(2,-1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. Here are several data points
✓
x
y
◆
given as follows:
✓
�4
�1
◆
,
✓
�3
0
◆
,
✓
�2
�1.5
◆
,
✓
�1
0.5
◆
,
✓
0
1
◆
,
✓
1
�1
◆
,
✓
2
�0.5
◆
,
✓
3
2
◆
,
✓
4
�1
◆
.
Find a function in the form
y = B + C cos(x) +D sin(x) + E cos(2x) + F sin(2x) +G cos(3x) +H sin(3x)
that best approximates the data points in the sense of least-squares?
Q14. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
3
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,1
2
)
(1,-1)
(2,-1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. Here are several data points
✓
x
y
◆
given as follows:
✓
�4
�1
◆
,
✓
�3
0
◆
,
✓
�2
�1.5
◆
,
✓
�1
0.5
◆
,
✓
0
1
◆
,
✓
1
�1
◆
,
✓
2
�0.5
◆
,
✓
3
2
◆
,
✓
4
�1
◆
.
Find a function in the form
y = B + C cos(x) +D sin(x) + E cos(2x) + F sin(2x) +G cos(3x) +H sin(3x)
that best approximates the data points in the sense of least-squares?
Q14. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
3
The entries of Ax are the y-coordinates of the graph of the line y = �3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax̂�B is the vertical distance of the graph from the data
points. y = f(x) = �3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared di↵erences
between the approximate y value given by the straight line and the desired value of
y given in B.
(1,0)
(0,6)
(2,0)
y = �3x+ 5
�1
�1
2
Q11. The general equation for a parabola is
y = Bx2 + Cx+D
where B, C and D are constants to be determined and the data point is in the form
of (x, y).
Since the four data points, (�1, 1
2
), (1,�1), (2,�1
2
), (3, 2), are approximated by this
parabola equation, it is expected that the value of y from the parabola equation
should produce:
Data point (�1,
1
2
) :
1
2
= B ⇥ (�1)2 + C ⇥ (�1) +D
Data point (1,�1) : �1 = B ⇥ (1)2 + C ⇥ (1) +D
Data point (2,�
1
2
) : �
1
2
= B ⇥ (2)2 + C ⇥ (2) +D
Data point (3, 2) : 2 = B ⇥ (3)2 + C ⇥ (3) +D.
8
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4
1 �1 1
1 1 1
4 2 1
9 3 1
3
77
5 , x =
2
4
B
C
D
3
5 , B =
2
66
4
1
2
�1
�1
2
2
3
77
5 .
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
The least-squares solution is found as
x =
2
4
B
C
D
3
5 = (ATA)�1ATB =
2
6666666
4
53
88
�
379
440
�
41
44
3
7777777
5
.
The best-fit parabola is
y = Bx2 + Cx+D =
53
88
x2 �
379
440
x�
41
44
,
which is equivalent to
88y = 53×2 �
379
5
x� 82.
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
88y = 53×2 � 379
5
x� 82
What exactly the parabola y = f(x) is minimizing?
9
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4
1 �1 1
1 1 1
4 2 1
9 3 1
3
77
5 , x =
2
4
B
C
D
3
5 , B =
2
66
4
1
2
�1
�1
2
2
3
77
5 .
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
The least-squares solution is found as
x =
2
4
B
C
D
3
5 = (ATA)�1ATB =
2
6666666
4
53
88
�
379
440
�
41
44
3
7777777
5
.
The best-fit parabola is
y = Bx2 + Cx+D =
53
88
x2 �
379
440
x�
41
44
,
which is equivalent to
88y = 53×2 �
379
5
x� 82.
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
88y = 53×2 � 379
5
x� 82
What exactly the parabola y = f(x) is minimizing?
9
The least-squares solution x minimizes kAx� bk22. In this case, Ax is
Ax =
2
66
4
1 �1 1
1 1 1
4 2 1
9 3 1
3
77
5
2
6666666
4
53
88
�
379
440
�
41
44
3
7777777
5
=
2
6666
4
53
88
(�1)2 � 379
440
(�1)� 41
44
53
88
(1)
2 � 379
440
(1)� 41
44
53
88
(2)
2 � 379
440
(2)� 41
44
53
88
(3)
2 � 379
440
(3)� 41
44
3
7777
5
=
2
6666
4
f(�1)
f(1)
f(2)
f(3)
3
7777
5
=
2
6666
4
117
220
�131
110
� 27
110
419
220
3
7777
5
.
The entries of Ax are the y-coordinates of the parabola 88y = 53×2� 379
5
x�82 with
x chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax�B is the vertical distance of the graph from the data
points. 88y = 53×2� 379
5
x�82 minimizes the sum of the squares of the these vertical
distances, i.e., that is to minimise the sum of the squared di↵erences between the
approximate y value and the desired value of y given in B.
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
88y = 53×2 � 379
5
x� 82
7
220
� 21
110
�14
55
21
220
Q12. The general equation for a linear function with two variables is
f(x, y) = Bx+ Cy +D. (1)
where B, C and D are constants to be determined.
If we feed the provided data into the general equation given above:
Data point (1, 0) : B ⇥ 1 + C ⇥ 0 +D = 0
Data point (0, 1) : B ⇥ 0 + C ⇥ 1 +D = 1
Data point (�1, 0) : B ⇥�1 + C ⇥ 0 +D = 3
Data point (0,�1) : B ⇥ 0 + C ⇥�1 +D = 4.
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5 , x =
2
4
B
C
D
3
5 , B =
2
66
4
0
1
3
4
3
77
5 .
10
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
The least-squares solution x minimizes kAx� bk22. In this case, Ax is
Ax =
2
66
4
1 �1 1
1 1 1
4 2 1
9 3 1
3
77
5
2
6666666
4
53
88
�
379
440
�
41
44
3
7777777
5
=
2
6666
4
53
88
(�1)2 � 379
440
(�1)� 41
44
53
88
(1)
2 � 379
440
(1)� 41
44
53
88
(2)
2 � 379
440
(2)� 41
44
53
88
(3)
2 � 379
440
(3)� 41
44
3
7777
5
=
2
6666
4
f(�1)
f(1)
f(2)
f(3)
3
7777
5
=
2
6666
4
117
220
�131
110
� 27
110
419
220
3
7777
5
.
The entries of Ax are the y-coordinates of the parabola 88y = 53×2� 379
5
x�82 with
x chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax�B is the vertical distance of the graph from the data
points. 88y = 53×2� 379
5
x�82 minimizes the sum of the squares of the these vertical
distances, i.e., that is to minimise the sum of the squared di↵erences between the
approximate y value and the desired value of y given in B.
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
88y = 53×2 � 379
5
x� 82
7
220
� 21
110
�14
55
21
220
Q12. The general equation for a linear function with two variables is
f(x, y) = Bx+ Cy +D.
where B, C and D are constants to be determined.
If we feed the provided data into the general equation given above:
Data point (1, 0) : B ⇥ 1 + C ⇥ 0 +D = 0
Data point (0, 1) : B ⇥ 0 + C ⇥ 1 +D = 1
Data point (�1, 0) : B ⇥�1 + C ⇥ 0 +D = 3
Data point (0,�1) : B ⇥ 0 + C ⇥�1 +D = 4.
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5 , x =
2
4
B
C
D
3
5 , B =
2
66
4
0
1
3
4
3
77
5 .
10
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise
the profit of the factory? What would be the maximum profit in this case?
3
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as
x =
2
6666666
4
B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3
3
7777777
5
=
2
666666
4
�
3
2
�
3
2
2
3
777777
5
.
Remark: The solution is the same as x = (ATA)�1ATB.
So the linear equation that approximates the data in the sense of least-squares is
f(x, y) = �
3
2
x�
3
2
y + 2.
Let’s move one step further to investigate the quantity that is being minimized.
Function f(x, y) forms a graph of surface as shown below. The vector Ax are:
11
Theaboveisformulatedasaleast-squaresproblem,i.e.,
min
x
||Ax−B||
2
2
.
Itisnoticedthatthecolumnsa
1
,a
2
,a
3
ofmatrixAareorthogonal.Therefore,the
solutioncanthusbecalculatedas
x=
2
6
6
6
6
6
6
6
4
B·a
1
a
1
·a
1
B·a
2
a
2
·a
2
B·a
3
a
3
·a
3
3
7
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
−
3
2
−
3
2
2
3
7
7
7
7
7
7
5
.
Remark:Thesolutionisthesameasx=(A
T
A)
−1
A
T
B.
Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresis
f(x,y)=−
3
2
x−
3
2
y+2.
Let’smoveonestepfurthertoinvestigatethequantitythatisbeingminimized.
Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:
11
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as
x =
2
6666666
4
B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3
3
7777777
5
=
2
666666
4
�
3
2
�
3
2
2
3
777777
5
.
Remark: The solution is the same as x = (ATA)�1ATB.
So the linear equation that approximates the data in the sense of least-squares is
f(x, y) = �
3
2
x�
3
2
y + 2.
Let’s move one step further to investigate the quantity that is being minimized.
Function f(x, y) forms a graph of surface as shown below. The vector Ax are:
11
Theaboveisformulatedasaleast-squaresproblem,i.e.,minx||Ax−B||22.Itisnoticedthatthecolumnsa1,a2,a3ofmatrixAareorthogonal.Therefore,thesolutioncanthusbecalculatedasx=266666664B·a1a1·a1B·a2a2·a2B·a3a3·a3377777775=26666664−32−32237777775.Remark:Thesolutionisthesameasx=(ATA)−1ATB.Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresisf(x,y)=−32x−32y+2.Let’smoveonestepfurthertoinvestigatethequantitythatisbeingminimized.Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:11
Ax =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5
2
666666
4
�
3
2
�
3
2
2
3
777777
5
==
2
66666666
4
�3
2
⇥ 1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥ 1 + 2
�3
2
⇥�1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥�1 + 2
3
77777777
5
=
2
66
4
f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)
3
77
5 =
2
666666666666
4
1
2
1
2
7
2
7
2
3
777777777777
5
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:
max
x,y
Z = 600x+ 400y
subject to
x+ 2y 12 (constraint on Machine I)
2x+ y 12 (constraint on Machine II)
x+
5
4
y � 5 (constraint on Machine III)
x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
The above is formulated as a least-squares problem, i.e.,
min
x
||Ax�B||22.
It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as
x =
2
6666666
4
B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3
3
7777777
5
=
2
666666
4
�
3
2
�
3
2
2
3
777777
5
.
Remark: The solution is the same as x = (ATA)�1ATB.
So the linear equation that approximates the data in the sense of least-squares is
f(x, y) = �
3
2
x�
3
2
y + 2.
Let’s move one step further to investigate the quantity that is being minimized.
Function f(x, y) forms a graph of surface as shown below. The vector Ax are:
11
Theaboveisformulatedasaleast-squaresproblem,i.e.,minx||Ax−B||22.Itisnoticedthatthecolumnsa1,a2,a3ofmatrixAareorthogonal.Therefore,thesolutioncanthusbecalculatedasx=266666664B·a1a1·a1B·a2a2·a2B·a3a3·a3377777775=26666664−32−32237777775.Remark:Thesolutionisthesameasx=(ATA)−1ATB.Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresisf(x,y)=−32x−32y+2.Let’smoveonestepfurthertoinvestigatethequantitythatisbeingminimized.
Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:
11
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.
(�1,
1
2
), (1,�1), (2,�
1
2
), (3, 2).
What quantity is being minimized?
(-1,
1
2
)
(1,-1)
(2,-
1
2
)
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.
x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
Ax =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5
2
666666
4
�
3
2
�
3
2
2
3
777777
5
==
2
66666666
4
�3
2
⇥ 1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥ 1 + 2
�3
2
⇥�1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥�1 + 2
3
77777777
5
=
2
66
4
f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)
3
77
5 =
2
666666666666
4
1
2
1
2
7
2
7
2
3
777777777777
5
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:
max
x,y
Z = 600x+ 400y
subject to
x+ 2y 12 (constraint on Machine I)
2x+ y 12 (constraint on Machine II)
x+
5
4
y � 5 (constraint on Machine III)
x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
Ax =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5
2
666666
4
�
3
2
�
3
2
2
3
777777
5
==
2
66666666
4
�3
2
⇥ 1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥ 1 + 2
�3
2
⇥�1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥�1 + 2
3
77777777
5
=
2
66
4
f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)
3
77
5 =
2
666666666666
4
1
2
1
2
7
2
7
2
3
777777777777
5
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:
max
x,y
Z = 600x+ 400y
subject to
x+ 2y 12 (constraint on Machine I)
2x+ y 12 (constraint on Machine II)
x+
5
4
y � 5 (constraint on Machine III)
x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
Ax =
2
66
4
1 0 1
0 1 1
�1 0 1
0 �1 1
3
77
5
2
666666
4
�
3
2
�
3
2
2
3
777777
5
==
2
66666666
4
�3
2
⇥ 1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥ 1 + 2
�3
2
⇥�1� 3
2
⇥ 0 + 2
�3
2
⇥ 0� 3
2
⇥�1 + 2
3
77777777
5
=
2
66
4
f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)
3
77
5 =
2
666666666666
4
1
2
1
2
7
2
7
2
3
777777777777
5
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:
max
x,y
Z = 600x+ 400y
subject to
x+ 2y 12 (constraint on Machine I)
2x+ y 12 (constraint on Machine II)
x+
5
4
y � 5 (constraint on Machine III)
x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
x – axis
-4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4
y – axis
1 4
1 3
1 2
1 1
1 0
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com
x – axis
-4-3-2-11234567891011121314
y – axis
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com
The feasible regions is the shaded area and A, B, C, D, E are corner-point feasible
solutions (CPF solutions). Evaluate Z = 600x+ 400y at these corner points.
Corner point (x, y) Z = 600x+ 400y
A: (5,0) 3000
B: (6,0) 3600
C: (4,4) 4000 (maximum)
D: (0,6) 2400
E: (0,4) 1600
Therefore, producing 4 units of each item (x = 4 and y = 4) can produce the
maximum profit of £4000.
Q14. Denote x and y as the units of commodity transported from factory at P to the
depots at A and B, respectively.
As all produced commodities from both factories will be transported and factory at
P can produce 8 units, (8� x� y) units will be transported from P to C.
As the number of units to be transported must be non-negative, we have the fol-
lowing constraints: x � 0, y � 0 and 8� x� y � 0.
Depot A requires 5 units of commodity weekly and x units have been provided by
factory P. Thus, the remaining (5 � x) units need to be transported to deport A
from the factory at Q.
13
ThefeasibleregionsistheshadedareaandA,B,C,D,Earecorner-pointfeasible
solutions(CPFsolutions).EvaluateZ=600x+400yatthesecornerpoints.
Cornerpoint(x,y)Z=600x+400y
A:(5,0) 3000
B:(6,0) 3600
C:(4,4)4000(maximum)
D:(0,6) 2400
E:(0,4) 1600
Therefore,producing4unitsofeachitem(x=4andy=4)canproducethe
maximumprofitof£4000.
Q14.DenotexandyastheunitsofcommoditytransportedfromfactoryatPtothe
depotsatAandB,respectively.
Asallproducedcommoditiesfrombothfactorieswillbetransportedandfactoryat
Pcanproduce8units,(8−x−y)unitswillbetransportedfromPtoC.
Asthenumberofunitstobetransportedmustbenon-negative,wehavethefol-
lowingconstraints:x≥0,y≥0and8−x−y≥0.
DepotArequires5unitsofcommodityweeklyandxunitshavebeenprovidedby
factoryP.Thus,theremaining(5−x)unitsneedtobetransportedtodeportA
fromthefactoryatQ.
13
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from di↵erent factories to di↵erent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?
Q16. Explain how “Recursive least-squares” works.
4
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from di↵erent factories to di↵erent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?
Q16. Explain how “Recursive least-squares” works.
4
The feasible regions is the shaded area and A, B, C, D, E are corner-point feasible
solutions (CPF solutions). Evaluate Z = 600x+ 400y at these corner points.
Corner point (x, y) Z = 600x+ 400y
A: (5,0) 3000
B: (6,0) 3600
C: (4,4) 4000 (maximum)
D: (0,6) 2400
E: (0,4) 1600
Therefore, producing 4 units of each item (x = 4 and y = 4) can produce the
maximum profit of £4000.
Q14. Denote x and y as the units of commodity transported from factory at P to the
depots at A and B, respectively.
As all produced commodities from both factories will be transported and factory at
P can produce 8 units, (8� x� y) units will be transported from P to C.
As the number of units to be transported must be non-negative, we have the fol-
lowing constraints: x � 0, y � 0 and 8� x� y � 0.
Depot A requires 5 units of commodity weekly and x units have been provided by
factory P. Thus, the remaining (5 � x) units need to be transported to deport A
from the factory at Q.
13
ThefeasibleregionsistheshadedareaandA,B,C,D,Earecorner-pointfeasiblesolutions(CPFsolutions).EvaluateZ=600x+400yatthesecornerpoints.Cornerpoint(x,y)Z=600x+400yA:(5,0) 3000B:(6,0) 3600C:(4,4)4000(maximum)D:(0,6) 2400E:(0,4) 1600Therefore,producing4unitsofeachitem(x=4andy=4)canproducethemaximumprofitof£4000.Q14.DenotexandyastheunitsofcommoditytransportedfromfactoryatPtothe
depotsatAandB,respectively.
Asallproducedcommoditiesfrombothfactorieswillbetransportedandfactoryat
Pcanproduce8units,(8−x−y)unitswillbetransportedfromPtoC.
Asthenumberofunitstobetransportedmustbenon-negative,wehavethefol-
lowingconstraints:x≥0,y≥0and8−x−y≥0.
DepotArequires5unitsofcommodityweeklyandxunitshavebeenprovidedby
factoryP.Thus,theremaining(5−x)unitsneedtobetransportedtodeportA
fromthefactoryatQ.
13
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.
(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can
produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.
As the number of units to be transported must be non-negative, we have another
three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y
Z = 10(x� 7y + 190)
subject to
x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative
x+ y 8 (8� x� y) � 0 units to be transported to depart C from factor at P
x 5 (5� x) � 0 units to be transported to deport A from factory at Q
y 5 (5� y) � 0 units to be transported to deport B from factory at Q
x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.
(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can
produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.
As the number of units to be transported must be non-negative, we have another
three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y
Z = 10(x� 7y + 190)
subject to
x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative
x+ y 8 (8� x� y) � 0 units to be transported to depart C from factor at P
x 5 (5� x) � 0 units to be transported to deport A from factory at Q
y 5 (5� y) � 0 units to be transported to deport B from factory at Q
x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
x – axis
-4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4
y – axis
1 4
1 3
1 2
1 1
1 0
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com
x – axis
-4-3-2-11234567891011121314
y – axis
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.
(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can
produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.
As the number of units to be transported must be non-negative, we have another
three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y
Z = 10(x� 7y + 190)
subject to
x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative
x+ y 8 (8� x� y) � 0 units to be transported to depart C from factor at P
x 5 (5� x) � 0 units to be transported to deport A from factory at Q
y 5 (5� y) � 0 units to be transported to deport B from factory at Q
x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
/
The shaded area represents the feasible region and A(0, 4), B(0, 5), C(3, 5), D(5, 3),
E(5, 0) and F(4, 0) are corner-point feasible solutions (CPF solutions). The evalua-
tion of Z at corner points are shown as below. It could be noticed that point (0, 5)
provides the minimum value 1550 of Z.
Corner point (x, y) Z = 10(x� 7y + 190)
A: (0,4) 1620
B: (0,5) 1550 (minimum)
C: (3,5) 1580
D: (5,3) 1740
E: (5,0) 1950
F: (4,0) 1940
Therefore, delivering 0, 5 and 3 units from the factory at P and 5, 0 and 1 units
from the factory at Q to the depots at A, B and C, respectively can minimize the
cost of transportation. The minimum cost of transportation is 1550.
Q15. Denote the decision variables x and y as the weight of Ingredient I and Ingredient
II, respectively, in the new type of food.
As the weights are non-negative, we have the constraints x � 0 and y � 0.
15
/TheshadedarearepresentsthefeasibleregionandA(0,4),B(0,5),C(3,5),D(5,3),
E(5,0)andF(4,0)arecorner-pointfeasiblesolutions(CPFsolutions).Theevalua-
tionofZatcornerpointsareshownasbelow.Itcouldbenoticedthatpoint(0,5)
providestheminimumvalue1550ofZ.
Cornerpoint(x,y)Z=10(x−7y+190)
A:(0,4) 1620
B:(0,5)1550(minimum)
C:(3,5) 1580
D:(5,3) 1740
E:(5,0) 1950
F:(4,0) 1940
Therefore,delivering0,5and3unitsfromthefactoryatPand5,0and1units
fromthefactoryatQtothedepotsatA,BandC,respectivelycanminimizethe
costoftransportation.Theminimumcostoftransportationis1550.
Q15.DenotethedecisionvariablesxandyastheweightofIngredientIandIngredient
II,respectively,inthenewtypeoffood.
Astheweightsarenon-negative,wehavetheconstraintsx≥0andy≥0.
15
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.
(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can
produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.
As the number of units to be transported must be non-negative, we have another
three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y
Z = 10(x� 7y + 190)
subject to
x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative
x+ y 8 (8� x� y) � 0 units to be transported to depart C from factor at P
x 5 (5� x) � 0 units to be transported to deport A from factory at Q
y 5 (5� y) � 0 units to be transported to deport B from factory at Q
x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from di↵erent factories to di↵erent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?
Q16. Explain how “Recursive least-squares” works.
4
/
The shaded area represents the feasible region and A(0, 4), B(0, 5), C(3, 5), D(5, 3),
E(5, 0) and F(4, 0) are corner-point feasible solutions (CPF solutions). The evalua-
tion of Z at corner points are shown as below. It could be noticed that point (0, 5)
provides the minimum value 1550 of Z.
Corner point (x, y) Z = 10(x� 7y + 190)
A: (0,4) 1620
B: (0,5) 1550 (minimum)
C: (3,5) 1580
D: (5,3) 1740
E: (5,0) 1950
F: (4,0) 1940
Therefore, delivering 0, 5 and 3 units from the factory at P and 5, 0 and 1 units
from the factory at Q to the depots at A, B and C, respectively can minimize the
cost of transportation. The minimum cost of transportation is 1550.
Q15. Denote the decision variables x and y as the weight of Ingredient I and Ingredient
II, respectively, in the new type of food.
As the weights are non-negative, we have the constraints x � 0 and y � 0.
15
/TheshadedarearepresentsthefeasibleregionandA(0,4),B(0,5),C(3,5),D(5,3),E(5,0)andF(4,0)arecorner-pointfeasiblesolutions(CPFsolutions).Theevalua-tionofZatcornerpointsareshownasbelow.Itcouldbenoticedthatpoint(0,5)providestheminimumvalue1550ofZ.Cornerpoint(x,y)Z=10(x−7y+190)A:(0,4) 1620B:(0,5)1550(minimum)C:(3,5) 1580D:(5,3) 1740E:(5,0) 1950F:(4,0) 1940Therefore,delivering0,5and3unitsfromthefactoryatPand5,0and1unitsfromthefactoryatQtothedepotsatA,BandC,respectivelycanminimizethecostoftransportation.Theminimumcostoftransportationis1550.Q15.DenotethedecisionvariablesxandyastheweightofIngredientIandIngredient
II,respectively,inthenewtypeoffood.
Astheweightsarenon-negative,wehavetheconstraintsx≥0andy≥0.
15
Considering the requirements for vitamins contained in the new type of food, we
have:
2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)
The cost of this new type of food is the total cost of the two ingredients where the
price £50/kg is for Ingredient I and £70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y.
min
x,y
Z = 50x+ 70y
subject to
2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)
x � 0 (The weight of Integredient I is non-negative)
y � 0 (The weight of Integredient II is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
16
Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,we
have:
2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswherethe
price£50/kgisforIngredientIand£70/kgisforIngredientII.Thecostofthenew
typeoffoodisthusdefinedasZ=50x+70y.
min
x,y
Z=50x+70y
subjectto
2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
x≥0(TheweightofIntegredientIisnon-negative)
y≥0(TheweightofIntegredientIIisnon-negative)
Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16
Considering the requirements for vitamins contained in the new type of food, we
have:
2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 8 or above)
The cost of this new type of food is the total cost of the two ingredients where the
price £50/kg is for Ingredient I and £70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y.
min
x,y
Z = 50x+ 70y
subject to
2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 8 or above)
x � 0 (The weight of Integredient I is non-negative)
y � 0 (The weight of Integredient II is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
16
Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,wehave:2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis8orabove)Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswheretheprice£50/kgisforIngredientIand£70/kgisforIngredientII.ThecostofthenewtypeoffoodisthusdefinedasZ=50x+70y.minx,yZ=50x+70ysubjectto2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis8orabove)x≥0(TheweightofIntegredientIisnon-negative)y≥0(TheweightofIntegredientIIisnon-negative)Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16
As shown in the figure, the feasible region (faded area in deep color) is unbounded.
The corner-point feasible solutions (CPF solutions) are A(0, 8), B(2, 4) and C(10, 0)
and the evaluation of Z is shown in the table below.
Corner point (x, y) Z = 50x+ 70y
A: (0,8) 560
B: (2,4) 380 (minimum)
C: (10,0) 500
The minimum value 380 is obtained at B(2, 4). However, the feasible region is
unbounded, further steps need to be taken to make sure that there is no other point
in the open half plane resulting that the Z value is smaller than 380. Hence, we
draw the region of inequality which satisfied:
50x+ 70y < 380,
which is equivalent to
5x+ 7y < 38.
The common area which satisfied both the inequality requirement (5x + 7y < 38)
and the constraints (x � 0 and y � 0) are marked in light color. We could observe
that there is no overlap between the feasible region and the graph of inequality.
Therefore we could say, the point (2,4) gives the minimum value Z of 380. So the
best formula of this new type of food is to have 2kg of ingredient I and 4kg of
ingredient II. The minimum cost would be £380.
Q16. A least-squares problem is described as min
x
f(x) =|| Ax�B ||22.
Its solution is given as x =
�
ATA
��1
ATB.
Denote the i-th row of A as aTi and the i-th row of Bi as bi.
The solution x can be represented in row form as follows:
x =
⇣ mX
i=1
aia
T
i
⌘�1 mX
i=1
biai.
For example,
A =
2
666
4
a11 a12 · · · a1n
a21 a22 · · · a2n
.
.
.
.
.
.
. . .
.
.
.
am1 am2 · · · amn
3
777
5
=
2
666
4
aT1
aT2
.
.
.
aTm
3
777
5
,
B =
2
666
4
b1
b2
.
.
.
bm
3
777
5
.
17
Considering the requirements for vitamins contained in the new type of food, we
have:
2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)
The cost of this new type of food is the total cost of the two ingredients where the
price £50/kg is for Ingredient I and £70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y.
min
x,y
Z = 50x+ 70y
subject to
2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)
x � 0 (The weight of Integredient I is non-negative)
y � 0 (The weight of Integredient II is non-negative)
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
16
Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,we
have:
2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswherethe
price£50/kgisforIngredientIand£70/kgisforIngredientII.Thecostofthenew
typeoffoodisthusdefinedasZ=50x+70y.
min
x,y
Z=50x+70y
subjectto
2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
x≥0(TheweightofIntegredientIisnon-negative)
y≥0(TheweightofIntegredientIIisnon-negative)
Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16
A least-squares problem is described as
min
x
f(x) =|| Ax�B ||22 .
Its solution is given as
x =
�
ATA
��1
ATB.
Denote the i-th row of A as aTi and the i-th row of Bi as bi.
The solution x can be represented in row form as follows:
x =
⇣ mX
i=1
aia
T
i
⌘�1 mX
i=1
biai.
For example,
A =
2
666
4
a11 a12 · · · a1n
a21 a22 · · · a2n
...
...
. . .
...
am1 am2 · · · amn
3
777
5
=
2
666
4
aT1
aT2
...
aTm
3
777
5
, B =
2
666
4
b1
b2
...
bm
3
777
5
.
ATA
mX
i=1
aia
T
i
2
666
4
a11 a12 · · · a1n
a21 a22 · · · a2n
…
…
. . .
…
am1 am2 · · · amn
3
777
5
,
a1
aT1
a1a
T
1 + a2a
T
2 + · · ·+ ama
T
m
A =
2
666
4
a11 a12 · · · a1n
a21 a22 · · · a2n
…
…
. . .
…
am1 am2 · · · amn
3
777
5
=
2
666
4
aT1
aT2
…
aTm
3
777
5
, B =
2
666
4
b1
b2
…
bm
3
777
5
.
h
am+1,1 am+1,2
… am+1,n
i
= aTm+1
bm+1
The new solution can be written as
xnew =
⇣ mX
i=1
aia
T
i + am+1a
T
m+1
⌘�1� mX
i=1
biai + bm+1am+1
�
.
At the time that you have m samples, recall that the solution is:
x = P(m)�1q(m)
where
P(m) = ATA =
mX
i=1
aia
T
i
and
q(m) = ATB =
mX
i=1
biai.
With the new sample aTm+1 and and bm+1, the solution is:
xnew = P(m+ 1)
�1q(m+ 1)
where
P(m+ 1) =
mX
i=1
aia
T
i + am+1a
T
m+1 = P(m) + am+1a
T
m+1
and
q(m+ 1) =
mX
i=1
biai + bm+1am+1.
Rank one update formula:
�
P+ aaT
��1
= P�1 �
1
1 + aTP�1a
�
P�1a
��
P�1a
�T
Remark: Rank one update formula is valid when P = PT , and P and P+ aaT
are both invertible.
Rank one update formula:
�
P+ aaT
��1
= P�1 �
1
1 + aTP�1a
�
P�1a
��
P�1a
�T
We have:
P(m+ 1) =
mX
i=1
aia
T
i + am+1a
T
m+1 = P(m) + am+1a
T
m+1
P(m+ 1)�1 ⌘
⇣
P(m) + am+1a
T
m+1
⌘�1
.
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from di↵erent factories to di↵erent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?
Q16. Explain how “Recursive least-squares” works.
Q17. Consider a least-squares problem as Ax = B, where
A =
2
4
5 6
5 8
4 5
3
5 ,B =
2
4
5
2
1
3
5
Given that there are five new samples coming in the the following order,
aT1 =
⇥
3 4
⇤
, b1 = 7;
aT2 =
⇥
10 10
⇤
, b2 = 5;
aT3 =
⇥
3 8
⇤
, b3 = 8;
aT4 =
⇥
8 8
⇤
, b4 = 2;
aT5 =
⇥
7 5
⇤
, b5 = 3.
calculate the initial solution, denoted as x0, for Ax = B and update the solution
using the recursive least-squares technique with the given new samples. Show the
working and the solution after adding the samples one by one according to the order
shown in the above.
4
/docProps/thumbnail.jpeg