CS计算机代考程序代写 database deep learning algorithm Deep Learning - COSC2779 - Deep Unsupervised Learning

Deep Learning – COSC2779 – Deep Unsupervised Learning

Deep Learning – COSC2779
Deep Unsupervised Learning

Dr. Iman Abbasnejad

September 20, 2021

Reference: Chapter 14, 20: Ian Goodfellow et. al., “Deep Learning”, MIT Press, 2016.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 1 / 51

Outline

1 AutoEncoders
2 Generative Adversarial Networks (GAN)
3 Text Generation
4 Speech Generation

Lecture 9 Deep Learning – COSC2779 September 20, 2021 2 / 51

Supervised Learning

Data:

D =
{(

x(i), y (i)
)}N

i=1

Goal: Learn a function that h : x→ y

Example: Classification, Regression,
Object detection, Image segmentation,
Sentiment analysis, Machine Translation,
Image captioning, . . .

Probabilistic interpretation:
p
(
y (i) | x (i)1 , · · · , x

(i)
d

)

Classification:

Dog

Lecture 9 Deep Learning – COSC2779 September 20, 2021 3 / 51

Unsupervised Learning

Supervised Learning

Data:

D =
{(

x(i), y (i)
)}N

i=1

Goal: Learn a function that h : x→ y

Example: Classification, Regression,
Object detection, Image segmentation,
Sentiment analysis, Machine Translation,
Image captioning, . . .

Unsupervised Learning

Data:
D =

{(
x(i), ·

)}N
i=1

Goal: Learn some underlying hidden
structure of the data.

Example: Clustering, dimensionality
reduction, feature learning, density
estimation, . . .

Lecture 9 Deep Learning – COSC2779 September 20, 2021 4 / 51

Unsupervised Learning

Example: K-means clustering

Data:
D =

{(
x(i), .

)}N
i=1

Goal: To assign sample x i to the k-th
cluster.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 5 / 51

Generative Models

A form of unsupervised learning.

Data:
D =

{
x(i)
}N

i=1

Goal: Given training data, generate new
samples from same distribution.

Example: Autoencoders, GAN,
One-to-Many RNN, . . .

Probabilistic interpretation:
p
(
x (i)1 , · · · , x

(i)
d

)

Learning:

pmodel (x)

Inference (testing):

pmodel (x)

Lecture 9 Deep Learning – COSC2779 September 20, 2021 6 / 51

Generative Models: Simple Example

Given the heights of students in a class, guess the height of the next student
entering the class.

height

pmodel

Assumption: The heights of students are normally distributed
pmodel = p

(
x (i)

)
= N (x i ;µ, σ) = 1

σ
√

2π
exp− (x

i−µ)2
2σ2

We can fit the model to data using MLE – Learning.

After Learning, we can use the model to generate new data – Sampling. E.g.
Inverse CDF sampling

Lecture 9 Deep Learning – COSC2779 September 20, 2021 7 / 51

Generative Models: Simple Example

Given the heights of students in a class, guess the height of the next student
entering the class.

height

pmodel

Assumption: The heights of students are normally distributed
pmodel = p

(
x (i)

)
= N (x i ;µ, σ) = 1

σ
√

2π
exp− (x

i−µ)2
2σ2

We can fit the model to data using MLE – Learning.

After Learning, we can use the model to generate new data – Sampling. E.g.
Inverse CDF sampling

Lecture 9 Deep Learning – COSC2779 September 20, 2021 7 / 51

Generative Models: Simple Example

Given the heights of students in a class, guess the height of the next student
entering the class.

height

pmodel

Assumption: The heights of students are normally distributed
pmodel = p

(
x1, x2, …, xn|Θ

)
=
∏N

i=1 p
(
x i |Θ

)
MLE:⇒ ∂

∂Θ
∏N

i=1 p
(
x i |Θ

)
= 0⇒

∑N
i=1

∂
∂Θp

(
x i |Θ

)
= 0

µMLE = 1N
∑i=N

i=1 x i , σMLE =
√

1
N
∑i=N

i=1 (x i − µ)2

Lecture 9 Deep Learning – COSC2779 September 20, 2021 8 / 51

Generative Models: Why?

Learn useful features for downstream tasks such as classification.
Getting insights from high-dimensional data (physics, medical imaging,
etc.)
Realistic samples for artwork, super-resolution, colorization, etc
Modeling physical world for simulation and planning
. . .

18 Impressive Applications of Generative Adversarial Networks (GANs)

Lecture 9 Deep Learning – COSC2779 September 20, 2021 9 / 51

https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/

Objective of the Lecture

Gain a basic understanding of the deep generative models applicable
for image, text and speech generation.
Will be a more conceptual lecture.
Will provide an example for training a generative model.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 10 / 51

Outline

1 AutoEncoders
2 Generative Adversarial Networks (GAN)
3 Text Generation
4 Speech Generation

Lecture 9 Deep Learning – COSC2779 September 20, 2021 11 / 51

Example Scenario

Airbus provides several services for the operation of
the Columbus module and its payloads on the
International Space Station (ISS).

To ensure the health of the crew as well as
hundreds of systems onboard the Columbus
module, engineers have to keep track of many
telemetry data-streams, which are constantly
beamed to earth.

Airbus is interested in automated detection of
anomalies in the telemetry data-streams.

Data: Telemetry data-streams of the last 10 years
results in over 5 trillion data points. However there
are very few (none for most systems) anomalies. How Airbus Detects Anomalies in ISS Telemetry

Data Using TFX

Lecture 9 Deep Learning – COSC2779 September 20, 2021 12 / 51

https://blog.tensorflow.org/2020/04/how-airbus-detects-anomalies-iss-telemetry-data-tfx.html
https://blog.tensorflow.org/2020/04/how-airbus-detects-anomalies-iss-telemetry-data-tfx.html

Detecting Anomalies in ISS Telemetry Data

Can this be approached as a supervised learning problem?

Heavily skewed data. 99.9% data maybe from Normal class. Very few
anomalous examples (none for most systems).
Anomaly is not a pattern that we can learn. The pattern is only in the
normal data.

We are interested in modelling how normal data is distributed. pnormal (x)

Lecture 9 Deep Learning – COSC2779 September 20, 2021 13 / 51

Detecting Anomalies in ISS Telemetry Data

Can this be approached as a supervised learning problem?
Heavily skewed data. 99.9% data maybe from Normal class. Very few
anomalous examples (none for most systems).
Anomaly is not a pattern that we can learn. The pattern is only in the
normal data.

We are interested in modelling how normal data is distributed. pnormal (x)

Lecture 9 Deep Learning – COSC2779 September 20, 2021 13 / 51

AutoEncoders: Intuition

Encoder Decoder

Code

Think about JPEG encoding and decoding. In that case both encoder and
decoder are predetermined functions.

Can we learn an encoding function and decoding function using data.

Parallel idea: PCA (Dimensionality reduction).

Lecture 9 Deep Learning – COSC2779 September 20, 2021 14 / 51

Autoencoder Basics

Encoder

z(i)

Decoder

x̂ (i)

x (i) Input

Code

Reconstruction

Unsupervised approach for learning a
lower-dimensional feature representation
from unlabeled training data

x is the input. Can be an image,
sequence or other feature vector.
x̂ is the prediction. Same dimension
as the input.
z is the Latent representation (code).
Usually has smaller dimensions than
the input.
Both encoder and decoder are neural
networks (MLP, CNN, RNN).

Lecture 9 Deep Learning – COSC2779 September 20, 2021 15 / 51

Autoencoder Training

Encoder

z(i)

Decoder

x̂ (i)

x (i)

L = ‖x − x̂‖2
Unsupervised approach for learning a
lower-dimensional feature representation
from unlabeled training data

Only need x to train the network.
z has to capture information in the
input image in order to be able to
reconstruct that image.
Therefore, this training process will
generate a model that can encode
image information into a compact
vector (z)

Will only work well for test inputs that
are “similar” to the training data.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 16 / 51

Convolutional Autoencoder for Images

Input Reconstruction

Feature

Encoder consists of convolution layers. Some layers with Striding (or
pooling) to reduce dimension.
Decorder consists of convolution layers. Some layers with Transpose
convolution (or un-pooling) to increase dimension.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 17 / 51

Detecting Anomalies in ISS Telemetry Data

For sequence data, both encoder and
decoder can be RNN.

How Airbus Detects Anomalies in ISS
Telemetry Data Using TFX

Train the AutoEncoder model only using
the normal data.

The AutoEncoder will learn to
reconstruct the normal data well – Low
reconstruction error on normal data.

The hypothesis is that it will NOT do well
on anomalous data.

Threshold the reconstruction error to
detect anomalies in test data.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 18 / 51

https://blog.tensorflow.org/2020/04/how-airbus-detects-anomalies-iss-telemetry-data-tfx.html
https://blog.tensorflow.org/2020/04/how-airbus-detects-anomalies-iss-telemetry-data-tfx.html

AutoEncoder for Learning Feature Extractors

Encoder

z(i)

Head

x (i)

Can use the trained encoder as a feature
extractor and do transfer learning for other
similar tasks.

A form of self-supervision.

Can also use z to cluster the dataset.

Used to be a common pre-training
technique for image classification. Not so
popular anymore as transfer learning has
taken over.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 19 / 51

Example: De-Noising AutoEncoder

You are given a set of images from a clothing database. The task is to create a feature
extractor that can be used for classifying the images to common clothing categories. E.g.
Trousers, t-shirts, . . .

An AutoEncoder with sufficient complexity may learn the so-called “Identity Function”,
meaning that the output equals the input, marking the AutoEncoder useless.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 20 / 51

Example: De-Noising AutoEncoder

Train an AutoEncoder with Noise added images as input and the corresponding training
image as the target.

This is called a de-noising AutoEncoder.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 21 / 51

Example: De-Noising AutoEncoder

Encoder

z(i)

Decoder

x̂ (i)n

x (i)n

L = ‖x − x̂n‖2

Noisy image

Original image
Train an AutoEncoder with Noise added
images as input and the corresponding
training image as the target.

At test time, Input corrupted test image
and the network will generate a noise free
version of it.

Now the network is forced to learn
underlying structure. Cannot take
shortcuts by memorizing inputs.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 22 / 51

Example: De-Noising AutoEncoder

Encoder

z(i)

Head

x (i)

Can use the trained encoder as a feature
extractor and do transfer learning for other
similar tasks.

A form of self-supervision.

Can also use z to cluster the dataset.

Used to be a common pre-training
technique for image classification. Not so
popular anymore as transfer learning has
taken over.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 23 / 51

Example: Segmentation AutoEncoder
U-Net is an architecture used for
segmentation.

With small training samples and
augmentation they obtained
state-of-the-art results on
biomedical image data.

It consists of a contracting path
(left side) and an expansive path
(right side).

A concatenation with the
correspondingly cropped feature
map from the contracting path in
the expansive path.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 24 / 51

Issues in Basic Autoencoder

Encoder

z(i)

Decoder

x̂ (i)

x (i) Input

Code

Reconstruction
The basic auto-encoder can memorize
(over-fit) training data.

Adding noise to input or feature.
Inpainting

Cannot generate novel images. Code
is unknown.

Variational Auto-encoder.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 25 / 51

Issues in Basic Autoencoder

Encoder

z(i)

Decoder

x̂ (i)

x (i) Input

Code

Reconstruction
The basic auto-encoder can memorize
(over-fit) training data.

Adding noise to input or feature.
Inpainting

Cannot generate novel images. Code
is unknown.

Variational Auto-encoder.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 26 / 51

Variational Autoencoder

Encoder Decoder

Code

Think about JPEG encoding and decoding. In that case both encoder and
decoder are predetermined functions.

Can we alter the code to get new images?

Difficult. Would mostly be unrealistic even if we manage to generate an image.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 27 / 51

Variational Autoencoder

z(i)

Decoder

x̂ (i)

Code

Reconstruction

1 Sample z from prior: z(i) ∼ p (z)
2 Sample x from:

x (i) ∼ pmodel
(
x | z(i)

)

We want to learn p (x1, x2, · · · , xd ) given
training data.

In autoencoder we learn
p (x1, x2, · · · , xd | z)

What if z is from a known distribution for
realistic images. E.g. z(i) ∼ N (0, 1) –
Prior.

Then we can sample z(i) and use a decoder
network to generate images.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 28 / 51

Variational Autoencoder

Encoder

µ(i)Σ(i)

z(i)

Decoder

x̂ (i)

x (i) Input

Code

Reconstruction

VAE maps the input data into the
parameters of a probability distribution,
such as the mean and variance of a
Gaussian. This approach produces a
continuous, structured latent space, which
is useful for image generation.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 29 / 51

Variational Autoencoders

Probabilistic spin to traditional autoencoders
which allows generating data

Pros:
Principled approach to generative models.
Interpretable latent space.
Can be useful feature representation for
other tasks.

Cons:
Samples blurrier and lower quality
compared to state-of-the-art (GANs)

Image: Kingma and Welling, “Auto-Encoding Variational Bayes”,

ICLR 2014Lecture 9 Deep Learning – COSC2779 September 20, 2021 30 / 51

Outline

1 AutoEncoders
2 Generative Adversarial Networks (GAN)
3 Text Generation
4 Speech Generation

Lecture 9 Deep Learning – COSC2779 September 20, 2021 31 / 51

Example Scenario

Assume you are hired by a startup to design
a system that takes a face image as input
and generate aged versions of that face.

Image: businessinsider

Lecture 9 Deep Learning – COSC2779 September 20, 2021 32 / 51

Example Scenario

Assume you are hired by a startup to design
a system that takes a face image as input
and generate aged versions of that face.

It is not practical to generate a dataset
that has the same person images at
different times in history. This is required if
we are going to use supervised learning.

However it is relatively easy to collect a
dataset of faces of people with different
ages. E.g. Dataset of 40 year old’s, 50 year
old’s . . .

Image: Antipov, Grigory, et. al.“Face aging with conditional generative

adversarial networks.” ICIP, 2017.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 32 / 51

Generative Adversarial Networks: Intuition

Counterfeiter: Becomes good at generating realistic looking money by
learning about how he got caught.

Policeman: Becomes good at discriminating fake from real as the
Counterfeiter becomes more sophisticated.

A two player game where both learn from each other. Players are neural
networks for our use case.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 33 / 51

Generative Adversarial Networks: Intuition

Discriminator: Try to distinguish between real and fake images.

Discriminator has a binary classification task, Objective:

arg max
θd

∑
i

[
log Dθd

(
x (i)
)

+ log
(

1− Dθd
(

Gθg
(

z (i)
)))]

D(x) = pdatapdata+pg ⇒ Discriminator cannot distinguish between real
and fake therefore the optimum probability of D∗(x) = 12

Real images: x (i)
Fake images: Gθg

(
z (i)
)

Latent feature (code): z (i)

Lecture 9 Deep Learning – COSC2779 September 20, 2021 34 / 51

Generative Adversarial Networks: Intuition

Generator: Try to fool the discriminator by generating real-looking
images

Generator Objective:

arg max
θg

∑
i

[
log
(

Dθd
(

Gθg
(

z (i)
)))]

Generator wants to maximize objective such that D(G(z)) is close to
1 (discriminator is fooled into thinking generated G(z) is real)

Real images: x (i)

Fake images: Gθg
(

z (i)
)

Latent feature (code): z (i)

Lecture 9 Deep Learning – COSC2779 September 20, 2021 35 / 51

GAN: Model

Generator network takes a random vector and maps it to a image. For images
it is a upsampling network (transpose convolution).

Discriminator is a classification CNN network that can distinguish between real
and fake images.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 36 / 51

GAN: Algorithm

Algorithm 1: Training GANs: Goodfellow, NIPS 2014
for Number of training iterations do

for k steps do
Sample m code samples {z (1), · · · , z (m)};
Sample m real images {x (1), · · · , x (m)} from training data;
Update the discriminator weights (θd ) by maximizing discriminator objective.

end
Sample m code samples {z (1), · · · , z (m)};
Update the generator weights (θg ) by maximizing generator objective.

end

After training, use generator network to generate new images

Lecture 9 Deep Learning – COSC2779 September 20, 2021 37 / 51

GAN: Convolutional Architectures

The original GAN networks were notoriously difficult to train.

Architecture guidelines for stable deep convolutional GANs:
Replace any pooling layer in discriminator with strided convolutions and any pooling
layer in genenrator with transpose convolutions.
Use batch-norm in both generator and descriminator.
Remove FC layer from deep architectures.
Use ReLU activation in generator for all layers except for output where tanh is used.
Use LeakyReLU activation in the descriminator.

Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

Lecture 9 Deep Learning – COSC2779 September 20, 2021 38 / 51

GAN: Convolutional Architectures

Image: Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

Lecture 9 Deep Learning – COSC2779 September 20, 2021 39 / 51

GAN: Better Training and Generation

Developing better training algorithms for GANs is still a very active research
area:

LSGAN
Wasserstein GAN
Improved Wasserstein GAN
Progressive GAN
. . .

List of some GAN papers

Lecture 9 Deep Learning – COSC2779 September 20, 2021 40 / 51

https://github.com/nightrome/really-awesome-gan

Example Scenario

Assume you are hired by a startup to design
a system that takes a face image as input
and generate aged versions of that face.

It is not practical to generate a dataset
that has the same person images at
different times in history. This is required if
we are going to use supervised learning.

However it is relatively easy to collect a
dataset of faces of people with different
ages. E.g. Dataset of 40 year old’s, 50 year
old’s . . .

Can we solve this problem with GANs
we discussed so far?

Image: Antipov, Grigory, et. al.“Face aging with conditional generative

adversarial networks.” ICIP, 2017.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 41 / 51

Conditional GANs

To convey the fundamental idea only. the actual network is slightly more complicated.

Hidden representation (code), Z , is now generated by an encoder CNN.
The generated uses the code and age related information (a) to generate the images.
Discriminator takes in an image and age related information and decide weather the
pair is fake or real.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 42 / 51

Examples: Conditional GANs

Live Interactive Demos by NVIDIA Research

Lecture 9 Deep Learning – COSC2779 September 20, 2021 43 / 51

https://www.nvidia.com/en-us/research/ai-playground/

GAN: Image to Image translation

Image: Isola et al, “Image-to-image translation with conditional adversarial nets”, CVPR 2017

Image: Zhu et al, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”, ICCV 2017

Lecture 9 Deep Learning – COSC2779 September 20, 2021 44 / 51

GAN: Style Transfer

Lecture 9 Deep Learning – COSC2779 September 20, 2021 45 / 51

Outline

1 AutoEncoders
2 Generative Adversarial Networks (GAN)
3 Text Generation
4 Speech Generation

Lecture 9 Deep Learning – COSC2779 September 20, 2021 46 / 51

One-to-Many

Generating Text/Music.
E.g Generating text similar to Shakespeare’s writing.

Image: Sonnet 18 in the 1609 Quarto of Shakespeare’s sonnets.

Given some text from Shakespeare’s writing generate novel sentences that look
similar.

Lecture 9 Deep Learning – COSC2779 September 20, 2021 47 / 51

One-to-Many

Generating Shakespeare’s writing.
x 〈t〉 is a one-hot with size equal to number of characters.
ŷ 〈t〉 is a soft-max-out with size equal to number of characters.

x 〈1〉

ŷ 〈1〉 ŷ 〈2〉 ŷ 〈3〉 ŷ 〈Ty 〉

a〈1〉 a〈2〉 a〈3〉a〈0〉

Wax

Waa Waa Waa Waa

Wya Wya Wya Wya

. . .

x 〈2〉 x 〈3〉 x 〈5〉

All Waa,Wax ,Wya are shared across RNN cells

Text generation with an RNN

Lecture 9 Deep Learning – COSC2779 September 20, 2021 48 / 51

https://www.tensorflow.org/text/tutorials/text_generation

One-to-Many

Inference: Convert ŷ 〈t〉 to one-hot by sampling and input as
x 〈t+1〉

x 〈1〉

ŷ 〈1〉 ŷ 〈2〉 ŷ 〈3〉 ŷ 〈Ty 〉

a〈1〉 a〈2〉 a〈3〉a〈0〉

Wax

Waa Waa Waa Waa

Wya Wya Wya Wya

. . .

x 〈2〉

x 〈3〉 x 〈5〉

All Waa,Wax ,Wya are shared across RNN cells

Text generation with an RNN

Lecture 9 Deep Learning – COSC2779 September 20, 2021 48 / 51

https://www.tensorflow.org/text/tutorials/text_generation

One-to-Many

x 〈1〉

ŷ 〈1〉 ŷ 〈2〉 ŷ 〈3〉 ŷ 〈Ty 〉

a〈1〉 a〈2〉 a〈3〉a〈0〉

Wax

Waa Waa Waa Waa

Wya Wya Wya Wya

. . .

x 〈2〉 x 〈3〉

x 〈5〉

All Waa,Wax ,Wya are shared across RNN cells

Text generation with an RNN

Lecture 9 Deep Learning – COSC2779 September 20, 2021 48 / 51

https://www.tensorflow.org/text/tutorials/text_generation

One-to-Many

x 〈1〉

ŷ 〈1〉 ŷ 〈2〉 ŷ 〈3〉 ŷ 〈Ty 〉

a〈1〉 a〈2〉 a〈3〉a〈0〉

Wax

Waa Waa Waa Waa

Wya Wya Wya Wya

. . .

x 〈2〉 x 〈3〉 x 〈5〉

All Waa,Wax ,Wya are shared across RNN cells

Text generation with an RNN

Lecture 9 Deep Learning – COSC2779 September 20, 2021 48 / 51

https://www.tensorflow.org/text/tutorials/text_generation

Outline

1 AutoEncoders
2 Generative Adversarial Networks (GAN)
3 Text Generation
4 Speech Generation

Lecture 9 Deep Learning – COSC2779 September 20, 2021 49 / 51

Speech Generation

WaveNet by google is one of the most popular deep networks for speech
generation.

According to google: WaveNets are able to generate speech which mimics any
human voice and which sounds more natural than the best existing
Text-to-Speech systems, reducing the gap with human performance by over
50%.

Recommended Reading: WaveNet: A generative model for raw audio

Lecture 9 Deep Learning – COSC2779 September 20, 2021 50 / 51

https://deepmind.com/blog/article/wavenet-generative-model-raw-audio

Summary

Unsupervised learning: Deep generative models
AutoEncoders: Interpretable latent space. Allows inference of q(z |x),
can be useful feature representation for other tasks. Samples blurrier and
lower quality compared to state-of-the-art.
GANs: Take game-theoretic approach. learn to generate from training
distribution through 2-player game. Beautiful, state-of-the-art samples.
Trickier & more unstable to train. Can’t solve inference queries such as
p(x), p(z |x).

Lecture 9 Deep Learning – COSC2779 September 20, 2021 51 / 51

AutoEncoders
Generative Adversarial Networks (GAN)
Text Generation
Speech Generation

Related Posts