程序代写 Pattern Recognition, Neural Networks and Deep Learning (7CCSMPNN)

Pattern Recognition, Neural Networks and Deep Learning (7CCSMPNN)

Discriminator

Copyright By PowCoder代写 加微信 powcoder

Real samples

Original Cost:

Diminished gradient:

Steep: When the Generator works very well
Flat: When the Generator does NOT work well

V(D,G) = \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\underbrace{\log D(\mathbf{x})}_{\text{For real samples}}] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\underbrace{ \log( 1 – D(G(\mathbf{z} )))}_{\text{For fake samples}}]

Alternative Cost:

Steep: When the Generator does NOT work well
Flat: When the Generator works very well

V(D,G) = \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\underbrace{\log D(\mathbf{x})}_{\text{For real samples}}] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\underbrace{  -\log( D(G(\mathbf{z})) 
}_{\text{For fake samples}}]

This cost $\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [ -\log( D(G(\mathbf{z})) ]$ helps alleviate the gradient vanishing problem. The figure below shows the original and alternative costs. % \begin{figure}[H] % \centering % \includegraphics[scale =0.5] {costs.pdf} % \caption{Original ($\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\log( 1 – D(G(\mathbf{z})))]$) and alternative ($\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [ -\log( D(G(\mathbf{z})) ]$) costs.} % \label{fig:sketch of g(x)} % \end{figure} \begin{figure}[htb] \begin{center} \begin{tikzpicture}[scale =1] %\tikzset{every pin/.style={draw=black,fill=yellow!10}} \tikzset{every pin/.style={scale = 1, text=darkgray}} %\pgfplotsset{grid style={dashed,gray}} %\pgfplotsset{minor grid style={dashed,red}} \pgfplotsset{major grid style={dotted,gray}} \pgfplotsset{ x tick label style={font=\scriptsize, /pgf/number format/fixed}, scaled x ticks=false, y tick label style={font=\scriptsize}, x label style={font=\small}, y label style={font=\small}, %legend style={font=\footnotesize}, %grid style={dashed,lightgray}, %legend style={at={(axis cs:0,1.3)}, anchor=center, legend cell align=left}, } \begin{axis}[width=8cm, height=8cm, scale only axis, every axis plot post/.append style={very thick,font=\footnotesize, mark size=1, %domain=-5:5 }, xmin=0, xmax=1, ymin=-10, ymax=10, xtick={0, 0.1, …, 1.05}, ytick={-10, -8, …, 10}, %xticklabels={0, 0.1, …, 1}, yticklabels={-10, -8, …, 10}, %/pgf/number format/1000 sep={}, xlabel = $D(G(\mathbf{z}))$, ylabel = , % axis x line=bottom, % axis y line=left, xlabel near ticks, ylabel near ticks, grid=both, title={} ] % extend the axes a bit to the right and top \addplot [mark=none,domain=0:1,samples=500,blue,smooth] {ln( 1 – x)}; \addlegendentry{$\ln( 1 – D(G(\mathbf{z})))$} \addplot [mark=none,domain=0:1,samples=500,red,smooth,dashed] {-ln(x)}; \addlegendentry{$-\ln(D(G(\mathbf{z})))$} \end{axis} \end{tikzpicture} \end{center} \caption{Original ($\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\log( 1 – D(G(\mathbf{z})))]$) and alternative ($\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [ -\log( D(G(\mathbf{z})) ]$) costs.} \label{fig:two costs} \end{figure}

Below are some observations: \begin{itemize} \item The Generator $G(\mathbf{z})$ is to be trained that it is able to fool the Discriminator $D(\mathbf{x})$. So that $D(G(\mathbf{z}))$ should produce 1 (when the sample is fake). \item In the original cost, it can be seen that the slope for $\log( 1 – D(G(\mathbf{z})))$ when $D(G(\mathbf{z})) \approx 0$ is nearly flat. In that region, i.e., $D(G(\mathbf{z})) \approx 0$ needs more attention as the Generator does work well to fool the Discriminator, especially in the initial training stage. However, the nearly-flat slope implies the learning is very slow. \item When considering the alternative cost, $\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [ -\log( D(G(\mathbf{z})) ]$, it can been seen from the figure that its slope is much obvious in the region $D(G(\mathbf{z})) \approx 0$. Consequently, the gradient vanishing issue is alleviated.

Below are some observations: \begin{itemize} \item The Generator $G(\mathbf{z})$ is to be trained that it is able to fool the Discriminator $D(\mathbf{x})$. So that $D(G(\mathbf{z}))$ should produce 1 (when the sample is fake). \item In the original cost, it can be seen that the slope for $\log( 1 – D(G(\mathbf{z})))$ when $D(G(\mathbf{z})) \approx 0$ is nearly flat. In that region, i.e., $D(G(\mathbf{z})) \approx 0$ needs more attention as the Generator does work well to fool the Discriminator, especially in the initial training stage. However, the nearly-flat slope implies the learning is very slow. \item When considering the alternative cost, $\mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [ -\log( D(G(\mathbf{z})) ]$, it can been seen from the figure that its slope is much obvious in the region $D(G(\mathbf{z})) \approx 0$. Consequently, the gradient vanishing issue is alleviated. \item Both the original and alternative cost function have the same trend, i.e., monotonic decreasing. \end{itemize}

Alternative Cost:

Given the minimax game as below:
$$\displaystyle \min_{G} \max_{D} V(D,G) = \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\log( 1 – D(G(\mathbf{z})))]$$

Recall that the distribution for:
\begin{itemize}
  \item Real samples: $p_{data}(\mathbf{x})$
  \item Fake (generated) samples: $p_g(\mathbf{x})$. 
\end{itemize}

  When $\mathbf{x}$ is a continuous variable, the cost can be written as below:
\begin{align*}
V(D,G) &= \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\log( 1 – D(G(\mathbf{z})))]\\
&= \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{x} \thicksim p_g(\mathbf{x})} [\log( 1 – D(\mathbf{x}))]\\
&= \int_{\mathbf{x}} p_{data}(\mathbf{x}) \log D(\mathbf{x}) d\mathbf{x} + \int_{\mathbf{x}} p_g(\mathbf{x}) \log( 1 – D(\mathbf{x})) d\mathbf{x} \\
&= \int_{\mathbf{x}} \Big( p_{data}(\mathbf{x}) \log D(\mathbf{x}) + p_g(\mathbf{x}) \log( 1 – D(\mathbf{x})) \Big) d\mathbf{x}
\end{align*}

\noindent The optimal $V(D,G)$ is obtained when: $$\frac{d}{d D(\mathbf{x})} \Big( p_{data}(\mathbf{x}) \log D(\mathbf{x}) + p_g(\mathbf{x}) \log( 1 – D(\mathbf{x})) \Big) = 0.$$

\noindent Remark: The $\log(\cdot)$ operator is referred to $\ln(\cdot)$ (Natural Logarithm). 

\noindent In the following, we replace $\log(\cdot)$ by $\ln(\cdot)$.
\begin{align*}
\frac{d}{d D(\mathbf{x})} \Big( p_{data}(\mathbf{x}) \ln D(\mathbf{x}) + p_g(\mathbf{x}) \ln( 1 – D(\mathbf{x})) \Big) &= p_{data}(\mathbf{x}) \frac{1}{D(\mathbf{x})} – p_g(\mathbf{x}) \frac{1}{1 – D(\mathbf{x})}\\
&= \frac{ p_{data}(\mathbf{x})(1- D(\mathbf{x}) ) – p_g(\mathbf{x}) D(\mathbf{x}) }{D(\mathbf{x}) (1 – D(\mathbf{x}))}
\end{align*}
\noindent Setting $\frac{d}{d D(\mathbf{x})} \Big( p_{data}(\mathbf{x}) \log D(\mathbf{x}) + p_g(\mathbf{x}) \log( 1 – D(\mathbf{x})) \Big) = 0$, we obtain the best Discriminator as 
$$D(\mathbf{x}) = \frac{p_{data}(\mathbf{x})}{p_{data}(\mathbf{x}) + p_g(\mathbf{x})}.$$
\noindent The ideal distribution $p_g(\mathbf{x})$ is $p_{data}(\mathbf{x}) = p_g(\mathbf{x})$. It happens when the Generator reproduces the real data distribution. Consequently, we have the optimal Discriminator as
$$D^*(\mathbf{x}) = \frac{p_{data}(\mathbf{x})}{p_{data}(\mathbf{x}) + p_g(\mathbf{x})} = \frac{1}{2}.$$

\noindent When both Generator and Discriminator are optimal (denoted as $D^*$ and $G^*$ below), we have
\begin{align*}
V(D^*,G^*) &= \int_{\mathbf{x}} \Big( p_{data}(\mathbf{x}) \ln D^*(\mathbf{x}) + p_g(\mathbf{x}) \log( 1 – D^*(\mathbf{x})) \Big) d\mathbf{x} \\
&= \int_{\mathbf{x}} \Big( p_{data}(\mathbf{x}) \log \Big( \frac{1}{2} \Big) + p_g(\mathbf{x}) \log \Big( 1 – \frac{1}{2} \Big) \Big) d\mathbf{x} \\
&= \log \Big( \frac{1}{2} \Big) \int_{\mathbf{x}} p_{data}(\mathbf{x}) d\mathbf{x} + \log \Big( \frac{1}{2} \Big) \int_{\mathbf{x}} p_g(\mathbf{x}) d\mathbf{x} \\
&= 2 \log \Big( \frac{1}{2} \Big)\\
&= -2 \log (2).
\end{align*}

Discriminator

Real samples

$\mathbf{X}_{real} = \{\mathbf{x}_1, \mathbf{x}_2\}$ and the generated samples as $\mathbf{X}_{fake} = \{\tilde{\mathbf{x}}_1, \tilde{\mathbf{x}}_2\}$, where $\mathbf{x}_1 = \left[ \begin{array}{c} 1 \\ 2 \end{array} \right]$, $\mathbf{x}_2 = \left[ \begin{array}{c} 3 \\ 4 \end{array} \right]$, $\tilde{\mathbf{x}}_1 = \left[ \begin{array}{c} 5 \\ 6 \end{array} \right]$, $\tilde{\mathbf{x}}_2 = \left[ \begin{array}{c} 7 \\ 8 \end{array} \right]$
\noindent \textbf{Cost}: is $V(D,G) = \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\ln D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\ln( 1 – D(G(\mathbf{z})))]$\\~\\
\noindent Considering the first term $\mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\ln D(\mathbf{x})]$, its expectation is:
\begin{align*}
\mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\log D(\mathbf{x})] &= \displaystyle \sum_{i=1}^2 \ln D(\mathbf{x}_i) p_{data}(\mathbf{x}_i) \\
&= \ln D(\mathbf{x}_1) p_{data}(\mathbf{x}_1) + \ln D(\mathbf{x}_2) p_{data}(\mathbf{x}_2) \\
&= \ln D(\mathbf{x}_1) \frac{1}{2} + \ln D(\mathbf{x}_2) \frac{1}{2} \\
&= \underbrace{0.5 \ln \bigg( \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \mathbf{x}_1 \text{ in } \mathbf{X}_{real}} + \underbrace{0.5 \ln \bigg( \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \mathbf{x}_2 \text{ in } \mathbf{X}_{real}} \\
&= 0.5 \ln \bigg( \frac{1}{1 + e^{-(0.1 \times 1 – 0.2 \times 2 – 2)}} \bigg) + 0.5 \ln \bigg( \frac{1}{1 + e^{-(0.1 \times 3 – 0.2 \times 4 – 2)}} \bigg) \\
&= -2.4872
\end{align*}

% 0.5*log(1/(1 + exp(-(0.1*1 – 0.2*2 – 2)))) + 0.5*log(1/(1 + exp(-(0.1*3 – 0.2*4 – 2))))
% 0.5*log(sigmf(0.1*1 – 0.2*2, [1 2])) + 0.5*log(sigmf(0.1*3 – 0.2*4, [1 2]))

\noindent \textbf{Cost}: is $V(D,G) = \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\ln D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\ln( 1 – D(G(\mathbf{z})))]$\\~\\
\noindent Considering the first term $\mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\ln D(\mathbf{x})]$, its expectation is:
\begin{align*}
\mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\log D(\mathbf{x})] &= \displaystyle \sum_{i=1}^2 \ln D(\mathbf{x}_i) p_{data}(\mathbf{x}_i) \\
&= \ln D(\mathbf{x}_1) p_{data}(\mathbf{x}_1) + \ln D(\mathbf{x}_2) p_{data}(\mathbf{x}_2) \\
&= \ln D(\mathbf{x}_1) \frac{1}{2} + \ln D(\mathbf{x}_2) \frac{1}{2} \\
&= \underbrace{0.5 \ln \bigg( \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \mathbf{x}_1 \text{ in } \mathbf{X}_{real}} + \underbrace{0.5 \ln \bigg( \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \mathbf{x}_2 \text{ in } \mathbf{X}_{real}} \\
&= 0.5 \ln \bigg( \frac{1}{1 + e^{-(0.1 \times 1 – 0.2 \times 2 – 2)}} \bigg) + 0.5 \ln \bigg( \frac{1}{1 + e^{-(0.1 \times 3 – 0.2 \times 4 – 2)}} \bigg) \\
&= -2.4872
\end{align*}

% 0.5*log(1/(1 + exp(-(0.1*1 – 0.2*2 – 2)))) + 0.5*log(1/(1 + exp(-(0.1*3 – 0.2*4 – 2))))
% 0.5*log(sigmf(0.1*1 – 0.2*2, [1 2])) + 0.5*log(sigmf(0.1*3 – 0.2*4, [1 2]))

\noindent Considering the second term $\mathbb{E}_{\mathbf{z} \thicksim p_{z}(\mathbf{z})} [\ln (1-D(\mathbf{G(z)}))]$, its expectation is:
    \begin{align*}
        \mathbb{E}_{\mathbf{z} \thicksim p_{z}(\mathbf{z})} [\ln (1-D(\mathbf{G(z)}))] &= 
        \displaystyle \mathbb{E}_{\mathbf{z} \thicksim p_{z}(\mathbf{z})} [\ln (1-D(\tilde{\mathbf{x}}_i))] \\
        &= \ln (1- D(\tilde{\mathbf{x}}_1)) p_g(\tilde{\mathbf{x}}_1) + \ln (1-D(\tilde{\mathbf{x}}_2)) p_g(\tilde{\mathbf{x}}_2) \\
        &= \ln (1- D(\tilde{\mathbf{x}}_1)) \frac{1}{2} + \ln (1-D(\tilde{\mathbf{x}}_2)) \frac{1}{2} \\
        &= \underbrace{0.5 \ln \bigg( 1 – \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \tilde{\mathbf{x}}_1 \text{ in } \mathbf{X}_{fake}}  \\
    \underbrace{0.5 \ln \bigg( 1 – \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \tilde{\mathbf{x}}_2 \text{ in } \mathbf{X}_{fake}} \\
    & = 0.5 \ln \bigg(1 – \frac{1}{1 + e^{-(0.1 \times 5 – 0.2 \times 6 – 2)}} \bigg) \\
    & + 0.5 \ln \bigg(1 – \frac{1}{1 + e^{-(0.1 \times 7 – 0.2 \times 8 – 2)}} \bigg) \\
    &= -0.0593
    \end{align*}

% 0.5*log(1/(1 + exp(-(0.1*5 – 0.2*6 – 2)))) + 0.5*log(1/(1 + exp(-(0.1*7 – 0.2*8 – 2))))
% 0.5*log(sigmf(0.1*5 – 0.2*6, [1 2])) + 0.5*log(sigmf(0.1*7 – 0.2*8, [1 2]))

\noindent Considering the second term $\mathbb{E}_{\mathbf{z} \thicksim p_{z}(\mathbf{z})} [\ln (1-D(\mathbf{G(z)}))]$, its expectation is:
    \begin{align*}
        \mathbb{E}_{\mathbf{z} \thicksim p_{z}(\mathbf{z})} [\ln (1-D(\mathbf{G(z)}))] &= 
        \displaystyle \mathbb{E}_{\mathbf{z} \thicksim p_{z}(\mathbf{z})} [\ln (1-D(\tilde{\mathbf{x}}_i))] \\
        &= \ln (1- D(\tilde{\mathbf{x}}_1)) p_1(\tilde{\mathbf{x}}_1) + \ln (1-D(\tilde{\mathbf{x}}_2)) p_2(\tilde{\mathbf{x}}_2) \\
        &= \ln (1- D(\tilde{\mathbf{x}}_1)) \frac{1}{2} + \ln (1-D(\tilde{\mathbf{x}}_2)) \frac{1}{2} \\
        &= \underbrace{0.5 \ln \bigg( 1 – \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \tilde{\mathbf{x}}_1 \text{ in } \mathbf{X}_{fake}}  \\
    \underbrace{0.5 \ln \bigg( 1 – \frac{1}{1 + e^{-(\theta_{d_1} x_1 – \theta_{d_2} x_2 – 2)}} \bigg) }_{\text{For sample } \tilde{\mathbf{x}}_2 \text{ in } \mathbf{X}_{fake}} \\
    & = 0.5 \ln \bigg(1 – \frac{1}{1 + e^{-(0.1 \times 5 – 0.2 \times 6 – 2)}} \bigg) \\
    & + 0.5 \ln \bigg(1 – \frac{1}{1 + e^{-(0.1 \times 7 – 0.2 \times 8 – 2)}} \bigg) \\
    &= -0.0593
    \end{align*}
    
    \begin{align*}
        V(D,G) &= \mathbb{E}_{\mathbf{x} \thicksim p_{data}(\mathbf{x})} [\ln D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \thicksim p_{\mathbf{z}}(\mathbf{z})} [\ln( 1 – D(G(\mathbf{z})))] \\
        &= -2.4872 -0.0593 \\
        &= -2.5465
    \end{align*}

% 0.5*log(1/(1 + exp(-(0.1*5 – 0.2*6 – 2)))) + 0.5*log(1/(1 + exp(-(0.1*7 – 0.2*8 – 2))))
% 0.5*log(sigmf(0.1*5 – 0.2*6, [1 2])) + 0.5*log(sigmf(0.1*7 – 0.2*8, [1 2]))

\noindent Denote the generated samples as $\mathbf{x}_{fake}^{(i)} = \mathbf{z}^{(i)}$.

\begin{align*}
&\triangledown_{\theta_d} \frac{1}{m} \displaystyle \sum_{i=1}^m \bigg[ \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( G (\mathbf{z}^{(i)}) ) \big) \bigg] \\
%&= \triangledown_{\theta_d} \frac{1}{m} \displaystyle \sum_{i=1}^m \bigg[ \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( \mathbf{x}_{fake}^{(i)} ) \big) \bigg] \\
&= \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ \begin{array}{c} \frac{\partial}{\partial \theta_{d_1}} \Big( \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( G (\mathbf{z}^{(i)}) ) \big) \Big) \\ \frac{\partial}{\partial \theta_{d_2}} \Big( \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( G (\mathbf{z}^{(i)}) ) \big) \Big) \end{array} \right] \\
&= \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ \begin{array}{c} \frac{\partial}{\partial \theta_{d_1}} \ln D \big( \mathbf{x}^{(i)} \big) + \frac{\partial}{\partial \theta_{d_1}} \ln \big( 1 – D ( \mathbf{x}_{fake}^{(i)} ) \big) \\ \frac{\partial}{\partial \theta_{d_2}} \ln D \big( \mathbf{x}^{(i)} \big) + \frac{\partial}{\partial \theta_{d_2}} \ln \big( 1 – D ( \mathbf{x}_{fake}^{(i)} ) \big) \end{array} \right]\\
&= \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ \begin{array}{c} \alpha_{1}^{(i)} + \beta_{1}^{(i)} \\ \alpha_{2}^{(i)} + \beta_{2}^{(i)} \end{array} \right]\\
&= \frac{1}{2} \displaystyle \left[ \begin{array}{c} \alpha_{1}^{(1)} + \beta_{1}^{(1)} \\ \alpha_{2}^{(1)} + \beta_{2}^{(1)} \end{array} \right] + \frac{1}{2} \displaystyle \left[ \begin{array}{c} \alpha_{1}^{(2)} + \beta_{1}^{(2)} \\ \alpha_{2}^{(2)} + \beta_{2}^{(2)} \end{array} \right]\\
&= \frac{1}{2} \displaystyle \left[ \begin{array}{c} 0.9089 – 0.3149 \\ -1.8178 + 0.37784 \end{array} \right] + \frac{1}{2} \displaystyle \left[ \begin{array}{c} 2.7724 – 0.3651 \\ -3.6966 + 0.4172 \end{array} \right]\\
&= \frac{1}{2} \displaystyle \left[ \begin{array}{c} 0.5940 \\ -1.4399 \end{array} \right] + \frac{1}{2} \displaystyle \left[ \begin{array}{c} 2.4074 \\ -3.2793 \end{array} \right] = \displaystyle \left[ \begin{array}{c} 1.5007 \\ -2.3596 \end{array} \right]
\end{align*}

%syms t1 t2 x1 x2
%f1 = log(1/(1 + exp(-(t1*x1 – t2*x2 – 2))))
%diff(f1,t1)
%diff(f1,t2)
%f2 = log(1 – 1/(1 + exp(-(t1*x1 – t2*x2 – 2))))
%diff(f2,t1)
%diff(f2,t2)

%t1 = 0.1;t2 = 0.2;
%x1 = 1; x2 = 2;
%a = (x1*exp(t2*x2 – t1*x1 + 2))/(exp(t2*x2 – t1*x1 + 2) + 1)
%c = -(x2*exp(t2*x2 – t1*x1 + 2))/(exp(t2*x2 – t1*x1 + 2) + 1)
%x1 = 5; x2 = 6;
%b = (x1*exp(t2*x2 – t1*x1 + 2))/((1/(exp(t2*x2 – t1*x1 + 2) + 1) – 1)*(exp(t2*x2 – t1*x1 + 2) + 1)^2)
%d = -(x2*exp(t2*x2 – t1*x1 + 2))/((1/(exp(t2*x2 – t1*x1 + 2) + 1) – 1)*(exp(t2*x2 – t1*x1 + 2) + 1)^2)
%x1 = 3; x2 = 4;
%a = (x1*exp(t2*x2 – t1*x1 + 2))/(exp(t2*x2 – t1*x1 + 2) + 1)
%c = -(x2*exp(t2*x2 – t1*x1 + 2))/(exp(t2*x2 – t1*x1 + 2) + 1)
%x1 = 7; x2 = 8;
%b = (x1*exp(t2*x2 – t1*x1 + 2))/((1/(exp(t2*x2 – t1*x1 + 2) + 1) – 1)*(exp(t2*x2 – t1*x1 + 2) + 1)^2)
%d = -(x2*exp(t2*x2 – t1*x1 + 2))/((1/(exp(t2*x2 – t1*x1 + 2) + 1) – 1)*(exp(t2*x2 – t1*x1 + 2) + 1)^2)

Discriminator

Real samples

\noindent Denote the generated samples as $\mathbf{x}_{fake}^{(i)} = \mathbf{z}^{(i)}$.

\begin{align*}
&\triangledown_{\theta_d} \frac{1}{m} \displaystyle \sum_{i=1}^m \bigg[ \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( G (\mathbf{z}^{(i)}) ) \big) \bigg] \\
%&= \triangledown_{\theta_d} \frac{1}{m} \displaystyle \sum_{i=1}^m \bigg[ \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( \mathbf{x}_{fake}^{(i)} ) \big) \bigg] \\
&= \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ \begin{array}{c} \frac{\partial}{\partial \theta_{d_1}} \Big( \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( G (\mathbf{z}^{(i)}) ) \big) \Big) \\ \frac{\partial}{\partial \theta_{d_2}} \Big( \ln D \big( \mathbf{x}^{(i)} \big) + \ln \big( 1 – D ( G (\mathbf{z}^{(i)}) ) \big) \Big) \end{array} \right] \\
&= \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ \begin{array}{c} \frac{\partial}{\partial \theta_{d_1}} \ln D \big( \mathbf{x}^{(i)} \big) + \frac{\partial}{\partial \theta_{d_1}} \ln \big( 1 – D ( \mathbf{x}_{fake}^{(i)} ) \big) \\ \frac{\partial}{\partial \theta_{d_2}} \ln D \big( \mathbf{x}^{(i)} \big) + \frac{\partial}{\partial \theta_{d_2}} \ln \big( 1 – D ( \mathbf{x}_{fake}^{(i)} ) \big) \end{array} \right]\\
&= \frac{1}{m} \displaystyle \sum_{i=1}^m \left[ \begin{array}{c} \alpha_{1}^{(i)} + \beta_{1}^{(i)} \\ \alpha_{2}^{(i)} + \beta_{2}^{(i)} \end{array} \right]\\
&= \frac{1}{2} \displaystyle \left[ \begin{array}{c} \alpha_{1}^{(1)} + \beta_{1}^{(1)} \\ \alpha_{2}^{(1)} + \beta_{2}^{(1)} \end{array} \right] + \frac{1}{2} \displaystyle \left[ \begin{array}{c} \alpha_{1}^{(2)} + \beta_{1}^{(2)} \\ \alpha_{2}^{(2)} + \beta_{2}^{(2)} \end{array} \right]\\
&= \frac{1}{2} \displaystyle \left[ \begin{array}{c} 0.9089 – 0.3149 \\ -1.8178 + 0.37784 \end{array} \right] + \frac{1}{2} \displaystyle \left[ \begin{array}{c} 2.7724 – 0.3651 \\ -3.6966 + 0.4172 \end{array} \right]\\
&= \frac{1}{2} \displaystyle \left[ \begin{array}{c} 0.5940 \\ -1.4399 \end{array} \right] + \frac{1}{2} \displaystyle \left[ \begin{array}{c} 2.4074 \\ -3.2793 \end{array} \right] = \displaystyle \left[ \begin{array}{c} 1.5007 \\ -2.3596 \end{array} \right]
\end{align*}

%syms t1 t2 x1 x2
%f1 = log(1/(1 + exp(-(t1*x1 – t2*x2 – 2))))
%diff(f1,t1)
%diff(f1,t2)
%f2 = log(1 – 1/(1 + exp(-(t1*x1 – t2*x2 – 2))))
%diff(f2,t1)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com