CS考试辅导 Biological Inspired Methods

Biological Inspired Methods

It is not linearly separable as the two classes cannot be separated by a linear line.

h1(x): x1 = – 0.5
h5(x): x2 = – 0.5
Discussion: Which classifiers cannot be used? Can we keep them there? How to design these classifiers? Shall we design them beforehand?

+1 -1 +1 +1 0.75
+1 +1 +1 -1 0.25

\begin{tabular}{|c||c|c|c|c||c|}
Classifier & (1,0), +1 & ($-1$,0), +1 & (0,1), $-1$ & (0,$-1$), $-1$ & Training Error \\
\hline \hline
$h_1(\mathbf{x})$ & & & & & \\
$h_2(\mathbf{x})$ & \textcolor{green}{$-1$} & +1 & $-1$ & $-1$ & 0.25 \\
$h_3(\mathbf{x})$ & +1 & \textcolor{green}{$-1$} & $-1$ & $-1$ & 0.25 \\
$h_4(\mathbf{x})$ & \textcolor{green}{$-1$} & +1 & \textcolor{green}{+1} & \textcolor{green}{+1} & 0.75 \\
$h_5(\mathbf{x})$ & & & & & \\
$h_6(\mathbf{x})$ & \textcolor{green}{$-1$} & \textcolor{green}{$-1$} & $-1$ & \textcolor{green}{+1} & 0.75 \\
$h_7(\mathbf{x})$ & \textcolor{green}{$-1$} & \textcolor{green}{$-1$} & \textcolor{green}{+1} & $-1$ & 0.75 \\
$h_8(\mathbf{x})$ & +1 & +1 & $-1$ & \textcolor{green}{+1} & 0.25 \\
\end{tabular}

 How many samples, n = ?
 W1(i) = ? for i = 1, 2, 3, 4
kmax = ? 

\begin{itemize}
\item \textbf{Initialise parameters:} Choose $k_{\max} = ~~~~~$; $n = ~~~~~$; $W_1(i) = ~~~~~$, $i$ = 1, 2, 3, 4. \\
\item \textbf{Compute the training error:} Classifier $h_i(\mathbf{x})$: $E_1 = ~~~~~$; $E_2 = ~~~~~$; $E_3 = ~~~~~$; $E_4 = ~~~~~$; $E_5 = ~~~~~$; $E_6 = ~~~~~$; $E_7 = ~~~~~$; $E_8 = ~~~~~$ \\
\item \textbf{Pick the best classifier:} $\hat{h}_1(\mathbf{x}) = ~~~~~~~~~~~~~~~~~~~~$ \\
\item \textbf{Determine $\varepsilon_1$:} $\varepsilon_1 = $
\item \textbf{Determine $\alpha_1$:} $\alpha_1 = \frac{1}{2} \ln \left( \frac{1 – \varepsilon_1}{\varepsilon_1} \right) = 0.5493$.
\end{itemize}

\begin{itemize}
\item \textbf{Initialise parameters:} Choose $k_{\max} = 8$; $n = 4$; $W_1(i) = 1/n = 0.25$, $i$ = 1, 2, 3, 4. \\
\item \textbf{Compute the training error:} Classifier $h_i(\mathbf{x})$: $E_i = 0.25$, $i$ = 2, 3, 5, 8. Classifier $h_i(\mathbf{x})$: $E_i = 0.75$, $i$ = 1, 4, 6, 7. \\
\item \textbf{Pick the best classifier:} As classifiers $h_2(\mathbf{x})$, $h_3(\mathbf{x})$, $h_5(\mathbf{x})$ and $h_8(\mathbf{x})$ offer the same lowest training error, we randomly pick one in this round, say, $\hat{h}_1(\mathbf{x}) = h_2(\mathbf{x})$. \\
\item \textbf{Determine $\varepsilon_1$:} Choose $\varepsilon_1 = E_2 = 0.25$.
\item \textbf{Determine $\alpha_1$:} $\alpha_1 = \frac{1}{2} \ln \left( \frac{1 – \varepsilon_1}{\varepsilon_1} \right) = \frac{1}{2} \ln \left( \frac{1 – 0.25}{0.25} \right) = 0.5493$.
\item \textbf{Compute $W_2(i)$ for next round:} \\
\begin{tabular}{|c|c|c|c|c|}
$i$ & Class & $W_k(i)$ & $W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}$ & $W_{k+1}(i) = W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}/Z_k$ \\
1 & (1,0), +1 & 0.25 & $0.25 e^{0.5493} = 0.4330$ & 0.4330/0.8659 = 0.5001 \\
2 & ($-1$,0), +1 & 0.25 & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
3 & (0,1), $-1$ & 0.25 & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
4 & (0,$-1$), $-1$ & 0.25 & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
\end{tabular}
\item $Z_1 = \displaystyle \sum_{i=1}^{4} W_1(i) e^{-\alpha_1 y_i \hat{h}_1(\mathbf{x}_i)} = 0.4330 + 0.1443 \times 3 = 0.8659$
\item \textbf{AdaBoost Classifier in this round:} $h(\mathbf{x}) = \alpha_1 h_2(\mathbf{x}) = 0.5493 h_2(\mathbf{x})$ gives $\frac{1}{4} = 0.25$ (unweighed) training error, i.e., 1 out of 4 is wrongly classified.
\item Hard classifier is used for classification: $\text{sgn}(0.5493 h_2(\mathbf{x}))$. \\
\end{itemize}

We are at round 1: k = 1

\begin{itemize}
\item \textbf{Compute $W_2(i)$ for next round:} \\
{\renewcommand{\arraystretch}{2}
\begin{tabular}{|c|c|c|c|c|}
$i$ & Class & $W_k(i)$ & $W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}$ & $W_{k+1}(i) = W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}/Z_k$ \\
1 & (1,0), +1 & ~~~~~~~~~~ & ~~~~~~~~~~ & 0.4330/0.8659 = 0.5001 \\
2 & ($-1$,0), +1 & ~~~~~~~~~~ & ~~~~~~~~~~ & 0.1443/0.8659 = 0.1666 \\
3 & (0,1), $-1$ & ~~~~~~~~~~ & ~~~~~~~~~~ & 0.1443/0.8659 = 0.1666 \\
4 & (0,$-1$), $-1$ & ~~~~~~~~~~ & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
\end{tabular} }
\item $Z_1 = \displaystyle \sum_{i=1}^{4} W_1(i) e^{-\alpha_1 y_i \hat{h}_1(\mathbf{x}_i)} = 0.4330 + 0.1443 \times 3 = 0.8659$
\end{itemize}

\begin{itemize}
\item \textbf{Compute $W_2(i)$ for next round:} \\
\begin{tabular}{|c|c|c|c|c|}
$i$ & Class & $W_k(i)$ & $W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}$ & $W_{k+1}(i) = W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}/Z_k$ \\
1 & (1,0), +1 & 0.25 & $0.25 e^{0.5493} = 0.4330$ & 0.4330/0.8659 = 0.5001 \\
2 & ($-1$,0), +1 & 0.25 & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
3 & (0,1), $-1$ & 0.25 & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
4 & (0,$-1$), $-1$ & 0.25 & $0.25 e^{-0.5493} = 0.1443$ & 0.1443/0.8659 = 0.1666 \\
\end{tabular}
\item $Z_1 = \displaystyle \sum_{i=1}^{4} W_1(i) e^{-\alpha_1 y_i \hat{h}_1(\mathbf{x}_i)} = 0.4330 + 0.1443 \times 3 = 0.8659$
\item \textbf{AdaBoost Classifier in this round:} $h(\mathbf{x}) = \alpha_1 h_2(\mathbf{x}) = 0.5493 h_2(\mathbf{x})$ gives $\frac{1}{4} = 0.25$ (unweighed) training error, i.e., 1 out of 4 is wrongly classified.
\item Hard classifier is used for classification: $\text{sgn}(0.5493 h_2(\mathbf{x}))$.
\end{itemize}

We are at round 1: k = 1

– 0.5493 × (+1) × (–1) = 0.5494
– 0.5493 × (– 1) × (–1) = – 0.5494

\begin{itemize}
\item $k = 1$
\item All $W_k = 0.25$
\item $\alpha_k = 0.5493$
\item $\hat{h}_k(\mathbf{x}_i) = h_2(\mathbf{x}_k)$
\end{itemize}

(1, 0) is wrongly classified
Does the Adaboost classifier improve the classification accuracy at k = 1? Shall we stop the iterations?

0.5493 h_2(\mathbf{x}) = 0.5493 \times (x_1 + 0.5) = 0

W2(1) = 0.5001
W2(2) = 0.1666
W2(3) = 0.1666
W2(4) = 0.1666

Round 2: k = 2

W2(1) = 0.5001
W2(4) = 0.1666
W2(3) = 0.1666
W2(2) = 0.1666

\textbf{Pick the best classifier:} $\hat{h}_2(\mathbf{x}) = $

W2(1) = 0.5001
W2(4) = 0.1666
W2(3) = 0.1666
W2(2) = 0.1666

\begin{itemize}
\item \textbf{Pick the best classifier:} $\hat{h}_2(\mathbf{x}) = $ \\
\item \textbf{Determine $\varepsilon_2$:} $\varepsilon_2 =$
\item \textbf{Determine $\alpha_2$:} $\alpha_2 = \frac{1}{2} \ln \left( \frac{1 – \varepsilon_2}{\varepsilon_2} \right) = 0.8050$.
\item \textbf{Compute $W_2(i)$ for next round:} \\
\item $Z_2 = \displaystyle \sum_{i=1}^{4} W_1(i) e^{-\alpha_1 y_i \hat{h}_2(\mathbf{x}_i)} = 0.2236 + 0.3726 + 0.0745 \times 2 = 0.7452$
\item \textbf{AdaBoost Classifier in this round:} $h(\mathbf{x}) = \alpha_1 h_2(\mathbf{x}) + \alpha_2 h_3(\mathbf{x}) = 0.5493 h_2(\mathbf{x}) + 0.8050 h_3(\mathbf{x})$ gives $\frac{1}{4} = 0.25$ (unweighed) training error, i.e., 1 out of 4 is wrongly classified.
\item Hard classifier is used for classification: $\text{sgn}(0.5493 h_2(\mathbf{x}) + 0.8050 h_3(\mathbf{x}))$. \\
\end{itemize}

(–1, 0) is wrongly classified

W2(1) = 0.5001
W2(2) = 0.1666
W2(3) = 0.1666
W2(4) = 0.1666

Is the Adaboost classifier good enough? Shall we stop the iterations?

0.8050 h3(x)

\begin{itemize}
\item \textbf{Compute $W_2(i)$ for next round:} \\
\begin{tabular}{|c|c|c|c|c|}
$i$ & Class & $W_k(i)$ & $W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}$ & $W_{k+1}(i) = W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}/Z_k$ \\
1 & (1,0), +1 & 0.5001 & $0.5001 e^{-0.8050} = 0.2236$ & 0.2236/0.7452 = 0.3000 \\
2 & ($-1$,0), +1 & 0.1666 & $0.1666 e^{0.8050} = 0.3726$ & 0.3726/0.7452 = 0.5000 \\
3 & (0,1), $-1$ & 0.1666 & $0.1666 e^{-0.8050} = 0.0745$ & 0.0745/0.7452 = 0.1000 \\
4 & (0,$-1$), $-1$ & 0.1666 & $0.1666 e^{-0.8050} = 0.0745$ & 0.0745/0.7452 = 0.1000 \\
\end{tabular} }
\item $Z_2 = \displaystyle \sum_{i=1}^{4} W_1(i) e^{-\alpha_2 y_i \hat{h}_2(\mathbf{x}_i)} = 0.2236 + 0.3726 + 0.0745 \times 2 = 0.7452$
\item \textbf{AdaBoost Classifier in this round:} $h(\mathbf{x}) = \alpha_1 h_2(\mathbf{x}) + \alpha_2 h_3(\mathbf{x}) = 0.5493 h_2(\mathbf{x}) + 0.8050 h_3(\mathbf{x})$ gives $\frac{1}{4} = 0.25$ (unweighed) training error, i.e., 1 out of 4 is wrongly classified.
\item Hard classifier is used for classification: $\text{sgn}(0.5493 h_2(\mathbf{x}) + 0.8050 h_3(\mathbf{x}))$.
\end{itemize}

W3(1) = 0.3000
W3(4) = 0.1000
W3(3) = 0.1000
W3(2) = 0.5000
Round 3: k = 3
Updated error rate

W3(1) = 0.3000
W3(4) = 0.1000
W3(3) = 0.1000
W3(2) = 0.5000

0.8050 h3(x)

\begin{itemize}
\item \textbf{Determine $\varepsilon_3$:} Choose $\varepsilon_3 = E_5 = 0.1000$.
\item \textbf{Determine $\alpha_3$:} $\alpha_3 = \frac{1}{2} \ln \left( \frac{1 – \varepsilon_3}{\varepsilon_3} \right) = \frac{1}{2} \ln \left( \frac{1 – 0.1000}{0.1000} \right) = 1.0986$.
\item \textbf{Compute $W_3(i)$ for next round:} \\
\begin{tabular}{|c|c|c|c|c|}
$i$ & Class & $W_k(i)$ & $W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}$ & $W_{k+1}(i) = W_k(i) e^{-\alpha_k y_i \hat{h}_k(\mathbf{x}_i)}/Z_k$ \\
1 & (1,0), +1 & 0.3000 & $0.3000 e^{-1.0986} = 0.1000$ & 0.1000/0.6333 = 0.1579 \\
2 & ($-1$,0), +1 & 0.5000 & $0.5000 e^{-1.0986} = 0.1667$ & 0.1667/0.6333 = 0.2632 \\
3 & (0,1), $-1$ & 0.1000 & $0.1000 e^{1.0986} = 0.3000$ & 0.3000/0.6333 = 0.4737 \\
4 & (0,$-1$), $-1$ & 0.1000 & $0.1000 e^{-1.0986} = 0.0333$ & 0.0333/0.6333 = 0.0526 \\
\end{tabular} }
\item $Z_3 = \displaystyle \sum_{i=1}^{4} W_2(i) e^{-\alpha_3 y_i \hat{h}_3(\mathbf{x}_i)} = 0.1000 + 0.1667 + 0.3000 + 0.0333 = 0.6333$
\item \textbf{AdaBoost Classifier in this round:} $h(\mathbf{x}) = \alpha_1 h_2(\mathbf{x}) + \alpha_2 h_3(\mathbf{x}) + \alpha_3 h_5(\mathbf{x}) = 0.5493 h_2(\mathbf{x}) + 0.8050 h_3(\mathbf{x}) + 1.0986 h_5(\mathbf{x})$ gives 0 training error.
\end{itemize}

How good is this Adaboost classifier?

Design/determine a classifier ĥk(x) with the smallest weighed error weights Ei characterized by Wk(i)
Determine 𝜀k
Determine 𝛼k
Update Wk(i)
Is current Adaboost classifier satisfied?
Stop and output the Adaboost classifier
Initialisation

Correctly classified
Correctly classified
classified
classified
2 out of 4 are wrongly classified: 2/4 = 0.5

Is the bagging classifier working well? Why?

How to improve the performance of bagging classifier?

Sample 1, x1 = 1, x2 = 0, y = +1

x1 = 1 > 0.5
x2 = 0 > –0.5
x2 = 0 ≤ 0.5

\noindent Pass each training sample down each tree, and count how many from each class arrive at each leaf node.
\noindent Samples are:\\ $\{(\mathbf{x} = (1,0), y = +1),\\ \mathbf{x} = (-1,0), y = +1), \\(\mathbf{x} = (0,1), y = -1), \\(\mathbf{x} = (0,-1), y = -1)\}$\\

Sample $(1, 0)$ arrives at the third leaf node in tree 1.

Sample $(-1, 0)$ arrives at the first leaf node in tree 1.

Sample $(0, 1)$ arrives at the second leaf node in tree 1.

Sample $(0, -1)$ arrives at the first leaf node in tree 1.\\

Sample $(1, 0)$ arrives at the second leaf node in tree 2.

Sample $(-1, 0)$ arrives at the second leaf node in tree 2.

Sample $(0, 1)$ arrives at the third leaf node in tree 2.

Sample $(0, -1)$ arrives at the first leaf node in tree 2.
\noindent The counts of the number of training samples from each class $(-1/+1)$ ~arriving at each node are summarised in the figure.

Sample 2, x1 = –1, x2 = 0, y = +1

x1 = –1 < 0.5 x2 = 0 < 0.5 x2 = 0 > –0.5
x2 = 0 ≤ 0.5

Sample $(1, 0)$ arrives at the third leaf node in tree 1.

Sample $(-1, 0)$ arrives at the first leaf node in tree 1.

Sample $(0, 1)$ arrives at the second leaf node in tree 1.

Sample $(0, -1)$ arrives at the first leaf node in tree 1.\\

Sample $(1, 0)$ arrives at the second leaf node in tree 2.

Sample $(-1, 0)$ arrives at the second leaf node in tree 2.

Sample $(0, 1)$ arrives at the third leaf node in tree 2.

Sample $(0, -1)$ arrives at the first leaf node in tree 2.
\noindent The counts of the number of training samples from each class $(-1/+1)$ ~arriving at each node are summarised in the figure.

Sample 3, x1 = 0, x2 = 1, y = –1

x1 = 0 ≤ 0.5
x2 = 1 > 0.5
x2 = 1 > – 0.5
x2 = 1 > 0.5

Sample $(1, 0)$ arrives at the third leaf node in tree 1.

Sample $(-1, 0)$ arrives at the first leaf node in tree 1.

Sample $(0, 1)$ arrives at the second leaf node in tree 1.

Sample $(0, -1)$ arrives at the first leaf node in tree 1.\\

Sample $(1, 0)$ arrives at the second leaf node in tree 2.

Sample $(-1, 0)$ arrives at the second leaf node in tree 2.

Sample $(0, 1)$ arrives at the third leaf node in tree 2.

Sample $(0, -1)$ arrives at the first leaf node in tree 2.
\noindent The counts of the number of training samples from each class $(-1/+1)$ ~arriving at each node are summarised in the figure.

Sample 4, x1 = 0, x2 = –1, y = –1

x1 = 0 ≤ 0.5
x2 = –1 ≤ 0.5
x2 = –1 ≤ – 0.5

Sample $(1, 0)$ arrives at the third leaf node in tree 1.

Sample $(-1, 0)$ arrives at the first leaf node in tree 1.

Sample $(0, 1)$ arrives at the second leaf node in tree 1.

Sample $(0, -1)$ arrives at the first leaf node in tree 1.\\

Sample $(1, 0)$ arrives at the second leaf node in tree 2.

Sample $(-1, 0)$ arrives at the second leaf node in tree 2.

Sample $(0, 1)$ arrives at the third leaf node in tree 2.

Sample $(0, -1)$ arrives at the first leaf node in tree 2.
\noindent The counts of the number of training samples from each class $(-1/+1)$ ~arriving at each node are summarised in the figure.

Sample $(1, 0)$ arrives at the third leaf node in tree 1.

Sample $(-1, 0)$ arrives at the first leaf node in tree 1.

Sample $(0, 1)$ arrives at the second leaf node in tree 1.

Sample $(0, -1)$ arrives at the first leaf node in tree 1.\\

Sample $(1, 0)$ arrives at the second leaf node in tree 2.

Sample $(-1, 0)$ arrives at the second leaf node in tree 2.

Sample $(0, 1)$ arrives at the third leaf node in tree 2.

Sample $(0, -1)$ arrives at the first leaf node in tree 2.
\noindent The counts of the number of training samples from each class $(-1/+1)$ ~arriving at each node are summarised in the figure.

The “2” on the left means two samples belong to “–1” class”
The “2” on the right means two samples belong to “+1” class”
1st leaf node
2nd leaf node
3rd leaf node
1st leaf node
2nd leaf node
3rd leaf node

Sample $(1, 0)$ arrives at the third leaf node in tree 1.

Sample $(-1, 0)$ arrives at the first leaf node in tree 1.

Sample $(0, 1)$ arrives at the second leaf node in tree 1.

Sample $(0, -1)$ arrives at the first leaf node in tree 1.\\

Sample $(1, 0)$ arrives at the second leaf node in tree 2.

Sample $(-1, 0)$ arrives at the second leaf node in tree 2.

Sample $(0, 1)$ arrives at the third leaf node in tree 2.

Sample $(0, -1)$ arrives at the first leaf node in tree 2.
\noindent The counts of the number of training samples from each class $(-1/+1)$ ~arriving at each node are summarised in the figure.

\noindent For the first decision tree $P(class=-1 | \mathbf{x})=0.5$ at the first leaf node,\\ $P(class=-1 | \mathbf{x})=1$ at the second leaf node, $P(class=-1 | \mathbf{x})=0$ at the third leaf node.
\noindent For the second decision tree $P(class=-1 | \mathbf{x})=1$ at the first leaf node, $P(class=-1 | \mathbf{x})=0$ at the second leaf node, $P(class=-1 | \mathbf{x})=1$ at the third leaf node.

1st leaf node
2nd leaf node
3rd leaf node
1st leaf node
2nd leaf node
3rd leaf node
1/(1+1) = 0.5
1/(1+0) = 1
0/(0+1) = 0
1/(1+0) = 1
0/(1+1) = 0
1/(1+0) = 1

x1 = 0 < 0.5 x2 = 0 < 0.5 1st leaf node 2nd leaf node 3rd leaf node 1st leaf node 2nd leaf node 3rd leaf node x2 = 0 > –0.5
x2 = 0 < 0.5 Which class x = (0, 0) belongs to? \noindent To classify the new sample, it is passed down the tree until it reaches a leaf node. It is given the class label, $\omega_j$, for which $P(\omega_j | \mathbf{x})$ is a maximum at that leaf node. \noindent Sample $(0, 0)$ arrives at the first leaf node in tree 1. At that leaf node \\ $P(class=-1 | \mathbf{x})=0.5$ and $P(class=+1 | \mathbf{x})=0.5$ so class of new sample is indeterminate. \noindent Sample $(0, 0)$ arrives at the second leaf node in tree 2. At that leaf node $P(class=-1 | \mathbf{x})=0$ and $P(class=+1 | \mathbf{x})=1$ so class of new sample is +1. Classification in a random forest is the same procedure as for bagged decision trees: \begin{itemize} \item A sample is passed down all trees to reach a leaf node in each tree \item Each leaf node is associated with a class probability distribution $P(\omega | \mathbf{x})$ \item Overall class probability distribution is the mean of those associated with each leaf node reached by the sample: $$P(\omega | \mathbf{x}) = \frac{1}{M} \sum_{t=1}^{M} P_t(\omega | \mathbf{x})$$ \end{itemize} So for sample $(0, 0)$: $$P(class=-1 | \mathbf{x})=\frac{1}{2} \left(0.5 + 0\right)=0.25$$ $$P(class=+1 | \mathbf{x})=\frac{1}{2} \left(0.5 + 1\right)=0.75$$ So new sample is in class +1. Tree 1: 1/(1+1) Tree 2: 0/(0+2) Tree 1: 1/(1+1) Tree 2: 2/(0+2) Classification in a random forest is the same procedure as for bagged decision trees: \begin{itemize} 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts