15_generative-model
Qiuhong Ke
Generative models
COMP90051 Statistical Machine Learning
“What I Cannot Create, I Do Not Understand” ——Richard Feynman
Copyright: University of Melbourne
So far..
Classifier:
• SVM
• Perceptron
• Multi-layer perceptron
• CNN
Feature extraction:
• Pretrained CNN model
2
Using pretrained model in Keras
Feature extraction before the last classifier
3
Using pretrained model in Keras
Feature extraction from arbitrary layer
4
Using pretrained model in Keras
Perform classification
5
Predict class using a pretrained ResNet 50
Pretrained on ImageNet (1000 classes)
6
Beyond classification
Image generation
7
Beyond classification
Image generation
Figure 8.10 in Deep learning with python by Francois Chollet 8
Beyond classification
Image editing
Shen, Wei, and Rujie Liu. “Learning residual images for face attribute manipulation.” CVPR 2017.9
Generative Models
Cope with all of above tasks
• Variational Autoencoder (VAE)
• Generative Adversarial Network (GAN)
10
Outline
• Autoencoder (AE)
• Variational Autoencoder (VAE)
• Generative Adversarial Model (GAN)
11
Meaningful representations can reconstruct the input data
AE VAE GAN
Orange
Round
12
willoweit.
Highlight
Encoder – decoder
AE VAE GAN
Extract meaningful representation to reconstruct the input data
EncoderData Decoder Data
Latent representation
13
willoweit.
Typewritten Text
should be linear
Network architecture
AE VAE GAN
Encoder Decoder
z
FC FC FC FC
!!
!”
!#
“!
“#!…
…
14
Network architecture
AE VAE GAN
Conv
Max-
pooling
Encoder Decoder
Flatten FC FC Flatten Up-
sampling Conv
… …
15
Upsampling
AE VAE GAN
Upsampling
Size: The upsampling factors for rows and columns
16
Training and testing
AE VAE GAN
Predictions
Targets=input
Reconstruction
loss
Training:
Loss:
‘mse’:
L = E[(X − ypre)2]
Testing:
EncoderData
‘binary_crossentropy’: Input is between 0~1
17
Dimension reduction
AE VAE GAN
decodingencoding
EncoderData Decoder Data
18
Image denoisying
AE VAE GAN
Noisy
Clean
Noisy
Clean
Ground-truth
Predicted
Ground-truth
Predicted
19
Denoisying Autoencoder
AE VAE GAN
… …
Reconstruction
loss
Training (have access to clean data):
20
Denoisying Autoencoder
AE VAE GAN
… …
Testing:
21
Generate new data?
AE VAE GAN
DecoderEncoderData Data
fixed
fixed(existing)
NO
22
Data generation from latent variables
AE VAE GAN
-3.71
z
-2.36
⋯ ⋯
Encoding
-2.89
-1.74
Decoding
-3.21
-2.83
Decoding
23
Data generation from latent variable
AE VAE GAN
p(x): probability of the data x
p(z): probability of the latent variable z
p(x|z): probability of x given z
p(x) = ∫ p(x |z)p(z)dz
24
Turn latent variable into data using a decoder (network)
AE VAE GAN
Randomly
sampling
p(z) = N(z |0,I)Assume: I: identity matrix
Decoder
p(x |z)
z x̂x
Same
distribution as
the training set?
? x
25
Encoder: Reduce sampling space
AE VAE GAN
Decoder
Randomly
sampling
p(z |x)
p(x |z)
Use posterior distribution to sample z is more likely to produce x
x̂xzx Encoder
q(z |x)
∼
q(z |x) = N(z |μz, σ
2
z * I)
μz
σz
Intractable true
distribution
26
Math is coming…
AE VAE GAN
27
Network optimisation: Maximize log likelihood and gradient ascend
AE VAE GAN
log p(x(i)) = Ez∼q(z|x)[log p(x
(i))]
= Ez∼q(z|x) [log
p(x(i) |z)p(z)
p(z |x(i)) ] = Ez∼q(z|x) [log
p(x(i) |z)p(z)
p(z |x(i))
q(z |x(i))
q(z |x(i)) ]
= Ez∼q(z|x) [log p(x(i) |z)] − Ez∼q(z|x) [log
q(z |x(i))
p(z) ] + Ez∼q(z|x) [log
q(z |x(i))
p(z |x(i)) ]
= Ez∼q(z|x) [log p(x(i) |z)] − DKL [q(z |x(i)) | |p(z)] + DKL [q(z |x(i)) | |p(z |x(i))]
KL
Divergence
28
Maximize lower bound
AE VAE GAN
log p(x(i)) = Ez∼q(z|x) [log p(x(i) |z)] − DKL [q(z |x(i)) | |p(z)] + DKL [q(z |x(i)) | |p(z |x(i))]
≥ Ez∼q(z|x) [log p(x(i) |z)] − DKL [q(z |x(i)) | |p(z)] lower bound
max Ez∼q(z|x) [log p(x(i) |z)] :
Unknown true
posterior
Minimize reconstruction loss between output and input
q(z |x) : Output of encoder
p(z) : Prior distribution, assumed to be
standard Gaussian distribution p(z) = N(z |0,I)
q(z) = N(z |μ, σ2I)
DKL [q(z |x(i)) | |p(z)] : KL divergence of two gaussian distributions29
AE VAE GAN
Math is coming AGAIN…
30
Warning of math: KL divergence
AE VAE GAN
μKL(p1 | |p2) = ∫ p1(x)log
p1(x)
p2(x)
dx
p1(x) =
1
σ1 2π
e
− 12 ( x − μ1σ1 )
2
p2(x) =
1
σ2 2π
e
− 12 ( x − μ2σ2 )
2
= ∫ [−logσ1 −
1
2
(log(2π)) −
1
2 (
x − μ1
σ1 )
2
− (−logσ2 −
1
2
(log(2π)) −
1
2 (
x − μ2
σ2 )
2
]p1(x)dx
= − logσ1 + logσ2 +
1
2σ22
E1[(x − μ2)
2]−
1
2σ21
E1[(x − μ1)
2]
= − logσ1 + logσ2 +
1
2σ22
E1[(x − μ2)
2]−
1
2
31
Warning of math: KL divergence
AE VAE GAN
μ
p1(x) ∼ N(μ, σ
2) p2(x) ∼ N(0,1)
KL(p1, p2) = − logσ −
1
2
+
1
2
(σ2 + μ2)
=
1
2
(−logσ2 + σ2 + μ2 − 1)
KL(p1 | |p2) = ∫ p1(x)log
p1(x)
p2(x)
dx = − logσ1 + logσ2 +
1
2σ22
E1[(x − μ2)
2]−
1
2
32
(Summary) Different from AE: estimate the distribution of z
AE VAE GAN
Encoder Decoder
Fixed
μ
log σ2
μ
Randomly z using
Reparameterization trick
x ̂x
Training
For easy and stable
training: output logσ2
33
Sample z: Reparameterization trick to make network differentiable
AE VAE GAN
N(0,I) z0 μ + σz0 = N(μ, σ
2I)z
34
https://www.jeremyjordan.me/variational-autoencoders/
https://www.jeremyjordan.me/variational-autoencoders/
(Summary) Different from AE: Two losses
AE VAE GAN
Targets=input Reconstruction
loss
KL
divergence
Data Encoder
Decoder
Randomly
sampling
μ
log σ2
N(μ, σ2)
Training
KL =
1
2
(−logσ2 + σ2 + μ2 − 1)
p(z) = N(z |0,I)
35
https://www.jeremyjordan.me/variational-autoencoders/
https://www.jeremyjordan.me/variational-autoencoders/
https://www.jeremyjordan.me/variational-autoencoders/
https://www.jeremyjordan.me/variational-autoencoders/
DataDecoder
Randomly
sampling
z ∼ N(0,I)
z
log σ2
AE VAE GAN
Testing
36
Random z
z ∼ N(0,I)
Random digit
AE VAE GAN
Testing
37
AE VAE GAN
Conditional VAE
Data
Encoder
DataDecoder
Randomly
sampling
log σ2
μμ z
condition
conditionConcatenation
Concatenation
condition: c = [1,0,0,⋯,0]T
38
https://www.jeremyjordan.me/variational-autoencoders/
https://www.jeremyjordan.me/variational-autoencoders/
AE VAE GAN
Two-player game
Generator
https://pixabay.com/photos/tiger-cub-tiger-cub-sumatran-cute-164992/Source:
Discriminator
Real ?
Fake?Aims to generate more realistic images
to cheat the discriminator
Aims to tell whether the image
is generated or not
z ∼ N(0,I)
39
https://pixabay.com/photos/tiger-cub-tiger-cub-sumatran-cute-164992/
https://pixabay.com/photos/tiger-cub-tiger-cub-sumatran-cute-164992/
AE VAE GAN
Discriminator: binary cross-entropy
https://pixabay.com/photos/tiger-cub-tiger-cub-sumatran-cute-164992/Source:
Discriminator
y=1
y=0
binary cross-
entropy
Discriminator binary cross-
entropy
(training)
(training)
Generated image from noise with generator
Real image from training dataset
40
https://pixabay.com/photos/tiger-cub-tiger-cub-sumatran-cute-164992/
https://pixabay.com/photos/tiger-cub-tiger-cub-sumatran-cute-164992/
AE VAE GAN
Keras code for discriminator
41
AE VAE GAN
Generator: Fool discriminator by adversarial loss
Generator
(fix weights
just for
prediction)
Discriminator
Adversarial loss
y=1
binary cross-
entropy (BCE)
(training)
42
Is the generated image similar to real image?
Use the discriminator to check by predicting score.
If similar to real, discriminator will output high score,
and the BCE with 1 (adversarial loss) should be small
So the generator update weights to
minimise the adversarial loss
AE VAE GAN
Keras code for generator
43
AE VAE GAN
Conditional generative adversarial nets
Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).
44
AE VAE GAN
Context encoder
Pathak, Deepak, et al. “Context encoders: Feature learning by inpainting.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
45
AE VAE GAN
Context encoder
Pathak, Deepak, et al. “Context encoders: Feature learning by inpainting.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Generator
Generator
(Frozen)
Discriminator
y=1
binary cross-
entropy
(training)
Reconstruction
loss (L2)
Discriminator is the same as the original gan
Adversarial loss
46
AE VAE GAN
Other applications
Jin, Yanghua, et al. “Towards the automatic anime characters creation with generative adversarial networks.” arXiv preprint arXiv:1708.05509 (2017)47
AE VAE GAN
Other applications
Turning a horse video into a zebra video (by CycleGAN)
48
Summary
• How to train AE to learn meaningful representation and denoisy image?
• How to train VAE?
• How to train GAN?
49