程序代写 JAMA 316.22 (2016): 2402-2410

Exemplar Deep Learning Applications
Image Classification

Describe an example network for image classification

Copyright By PowCoder代写 加微信 powcoder

Explain the parameters defining the network
Identify common tricks for improving classification performance

Deep Learning for Image-based Recognition
|Visual recognition is an important part of human intelligence.
|ILSVRC (ImageNet Large-scale Visual Recognition Challenge) illustrates such a task.
|Many ImageNet images are difficult for conventional algorithms to classify.

Success Stories

ImageNet.org Samples
SOURCE: ImageNet.org

Success Stories: 2014 – Top Three
SOURCE: ImageNet.org

Success Stories: 2015 – Top Three
Entry Description
Description of Outside Data Used
Localization Error
Classifica- tion Error
Trimps- Soushen
Extra annotations collected by ourselves
Extra annotations collected by ourselves
Validate the classification model we used in DET Entry1
Share proposal procedure with DET for convenience
Average multiple models – validation accuracy is 79.78%
3000-class classification images from ImageNet are used to pre-train CNN
SOURCE: ImageNet.org

Example Application 1: DR Detection
|DR: Diabetic Retinopathy
|A recent work: Gulshan et al. “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.” JAMA 316.22 (2016): 2402-2410
-Employed large datasets
-A specific CNN architecture (Inception-v3) taking the entire image as input (as opposed to lesion/structure-specific CNNs)
-High performance: Comparable to a panel of 7 board-certified ophthalmologists

Example Application 2: Visual Aesthetics
|While being subjective, computational modes are possible since there are patterns in visually- appealing pictures.
-E.g., photographic rules.
|Huge on-line datasets available. If ratings are also available, the problem becomes supervised learning.
-Conventional approaches still face the bottleneck of feature extraction.

Example Application 2: Visual Aesthetics
|While being subjective, computational modes are possible since there are patterns in visually- appealing pictures.
-E.g., photographic rules.
|Huge on-line datasets available. If ratings are also available, the problem becomes supervised learning.
-Conventional approaches still face the bottleneck of feature extraction.

Related Approaches
|Solving the task as binary classification
Not beautiful Beautiful

Related Approaches: Examples
|RAPID: Rating Pictorial Aesthetics using Deep Learning (Lu et al.)
|Deep Multi-Patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation (Lu et al.)
|Image Aesthetic Evaluation Using Paralleled Deep Convolution Neural Network (Guo & Li)

A New Task: Relative Aesthetics
|Image retrieval |Image enhancement
Which is more beautiful? Most approaches do not solve
Not beautiful Beautiful

A Deep Learning Approach
|Dual-channeled CNN trained using relative learning
|Siamese Network characteristics (weight sharing) and hinge-loss function
|A custom data-set with relative labels – pairs formed based on aesthetic rating
SOURCE: Gattupalli et al. “A Computational Approach to Relative Aesthetics”, International Conference on Pattern Recognition (ICPR) 2016.

Constructing a Useful Data Set 1/2
|Total of 250,000 images extracted from dpchallenge.com
|Challenges under which users post their submission
|Peers rate and a final winner is selected based on the average rating
|Belong to a wide variety of semantic categories
SOURCE: AVA Dataset

Constructing a Useful Data Set 2/2
|The minimum gap between the average rating of the two images is one
-e.g., 3.4 and 4.5, 6.3, and 7.8
|The maximum variance allowed between the ratings of different voters is 2.6
|Pick pairs from the same category only
-e.g., cannot compare an image of a car and a building

The Network Architecture & Other Characteristics
Siamese Characteristics

Further Implementation Details
|Each channel contains two streams of processing: column 1 for global, and column 2 for local
|Global Patch
-e.g., rule of thirds, golden ration
|Local Patch
-e.g., smoothness/graininess

The Loss Function
𝐿 = max(0, δ − 𝑦 · 𝑑(𝐼1, 𝐼2)) 𝑑 𝐼1,𝐼2 =f(C1–C2)
𝑦 = True label of the image pair,
i.e., 1 if I1 > I2 and -1 otherwise
Hinge Loss
|C1, C2 = Outputs of channel 1 and channel 2 respectively

Sample Results
|Two ways of training -Using binary labels
-Using relative labels |Tested for two tasks
-For Binary Classification task -For Ranking task

Eight Experiments Total
Ranking (custom test-set)
Ranking (standard test-set)
Classification (custom test- set)
Classification (standard test-set)
Relative aesthetics

Exemplar Deep Learning Applications
Video-Based Inference

Describe unique challenges in using deep networks for sequential data
Describe the difference between image- based and video- based classification tasks
Explain the value of using video action recognition to contrast the difference between image-based and video-based classification tasks
Evaluate a video- based classification example using deep learning

Going from Image to Video
|Processing each frame |Extracting spatio-
of a video as an independent image and then aggregating the frame-level results
temporal features and an inference task will be based on such features

Video2Vec: Sample Applications
|We examine a deep learning approach for finding video representations that naturally encode spatial-temporal semantics.
|Mostly based on the following papers:
– Yikang Li, Sheng-hung Hu, Baoxin Li, “Recognizing Unseen Actions in a Domain-Adapted Embedding Space”, ICIP, Sep 2016.
– Yikang Li, Sheng-hung Hu, Baoxin Li, “Video2Vec: Learning Semantic Spatio-Temporal Embeddings for Video Representations”, ICPR, Dec 2016.

Video2Vec Deep Learning Model: Key Idea

Video2Vec Deep Learning Model: Implementation
|A two-stream CNN for extracting appearance and optical flow features
|RNNs for further global spatial-temporal encoding
|A MLP for final semantic embedding space

Applications of the Model
|Visual tasks:
– Video Action Recognition – Zero-Shot Learning
– Semantic Video Retrieval
|Dataset: UCF101 dataset (13320 video clips from 101 categories; training/testing ratio is 7:3; the split list is provided by its own web)

Additional Implementation Details 1/4
|Pretraining for the component models:
– Pre-trained Spatial CNN Model: VGG-f trained on
– Pre-trained OF CNN Model: Flow-net trained on UCF Sports
– Pre-trained Word2Vec Model: Wikipedia corpus contained 1 billion words

Additional Implementation Details 2/4
|Deep model parameter settings:
– CNNs: Pretrained model + the last layer (fc7) features
(dimension: 4096×1)
– RNNs: Hidden layer size is 1024×1
– MLP: Input layer size (2048×1), hidden layer size (1200×1), output layer size (500×1)
|Loss function:
– Hinge loss function for semantic embedding
– Softmax loss function for fine-tuning and classification

Additional Implementation Details 3/4
|Video processing settings:
– Dense Optical Flow and RGB frames are extracted at
– Building Video Sequence Mask for each training batch to make each sequence the same length.

Additional Implementation Details 4/4
|Training parameter settings:
– Learning rate: initialized as 0.0001 and reduced by half
each 15 epochs
– Total epoch: 60 epochs
– Batch size: 30 video clips
– Margin value for Hinge Loss function:
• a. For zero-shot learning, 0.4
• b. For video retrieval and action recognition, 0.55

Summaries of Key Results
|Dataset: UCF101 dataset Zero-shot learning results
• The model achieved state-of-the-art performance on ZSL even without any domain-adapted strategy.
Video action recognition
• The performance was on par with those with sophisticated fusion strategies or deeper networks.

Additional Results
|The task is to retrieve videos from training dataset by using query words that never appear in the training stage but share some information with training labels.
|The results show the top 10 retrieval video clips among video dataset.

Exemplar Deep Learning Applications
Generative Adversarial Networks (GANs)

Describe basic concepts and architecture for GANs
Illustrate variants of GANs and their applications

Generative Adversarial Networks (GANs)
|Proposed in 2014 by Goodfellow et al. |An architecture with two neural networks
gaming against each other.
– One attempting to learn a generative model
|Many variants have been proposed since the initial model.

Discriminative vs Generative Models 1/4
|Discriminative models: E.g., the familiar MLP
– Given {(xi, yi)}, to learn P(yi |x)
|More generally, we try to learn a posterior distribution of y given x, p(y|x)
– Usually reduced to posterior probabilities for classification problems
➔See also earlier discussion on Naïve Bayes vs Logistic Regression.

Discriminative vs Generative Models 2/4
|Generative models think the other direction: how to generate x given y
– E.g., xi=? if yi = 2?
|More generally, we try to learn a conditional distribution of x given y, p(x|y)

Discriminative vs Generative Models 3/4
|Illustrating the ideas

Discriminative vs Generative Models 4/4
|Estimating p(x|y) (or, in general any p(x), if we drop y by assuming it is given)
– Explicit density estimation: assuming some parametric or non-parametric models.
– Implicit density estimation: learn (essentially equivalent) models that may create good samples (as if from the “true” model), without explicitly defining the true model.
GAN is such an approach

Discriminative vs Generative Models 4/4
|Estimating p(x|y) (or, in general any p(x), if we drop y by assuming it is given)
– Explicit density estimation: assuming some parametric or non-parametric models.
– Implicit density estimation: learn (essentially equivalent) models that may create good samples (as if from the “true” model), without explicitly defining the true model.
GAN is such an approach

Basic GAN Architecture
Seed/Random z
Generated/Fake sample G(z)
Discriminator Network D(•)
Real data sample x
Generator Network G(•)
Real or fake?
| Objective of the Discriminator Network: making D(x)→1, D(G(z))→0
| Objective of the Generator Network: making D(G(z))→1

Basic GAN Training Algorithm
for number of training iterations do for k steps do
Sample minibatch of m noise samples {z(1), …, z(m)} from noise distribution pg(z)
Sample minibatch of m examples {x(1), …, x(m)} from data distribution pdata(x)
Update the discriminator by ascending its stochastic gradient:
𝛻 1෍[log𝐷𝒙𝑖 +log1−𝐷(𝐺𝒛𝑖 )] 𝜃𝑑𝑚
Sample minibatch of m noise samples {z(1), …, z(m)} from noise distribution pg(z) Update the generator by descending its stochastic gradient
𝜃 and 𝜃 𝑑𝑔
𝛻 1෍log1−𝐷(𝐺𝒛𝑖 ) 𝜃𝑔𝑚
are the parameters of the discriminator and generator respectively.
SOURCE: Goodfellow et al. https://arxiv.org/pdf/1406.2661.pdf

Applications of GAN
|GAN enabled many novel/interesting/fun applications.
|Many GAN-based models have been proposed, following the initial paper.
|Consider one example: Facial attribute manipulation
– Y. Wang et al. “Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network”, WACV 2018.

Facial Attribute Manipulation 1/2
SOURCE: Y. Wang et al. “Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network”, WACV 2018.

Facial Attribute Manipulation 2/2
SOURCE: Y. Wang et al. “Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network”, WACV 2018.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com