CS计算机代考程序代写 algorithm HW2P2 bootcamp

HW2P2 bootcamp

HW2P2 Bootcamp

Face Classification and Verification

Logistics

● HW2P2 has two parts and their corresponding Kaggle competitions. They are both due
on 21st October, 2021 11:59 EDT.

● HW2P2 is not as easy as HW1P2 so please start early. Model training and convergence
itself will take a lot of time.

● The public leaderboard is based only on 30% of the test data unlike HW1P2 which was
based on 70% of the test data.

● Ensure that your models are not overfit because a high score on public leaderboard might
not necessarily contribute to a high score on the private leaderboard.

● The baseline architecture is already on Piazza and it will not necessarily help you cross B
cutoff.

● You will not be provided with a base notebook to edit, we will provide only some code
snippets to help you implement the model.

Problem Statement

Convolutional networks are very good feature extractors. We use them for extracting facial

features which can then be fed to any other classification network.

1. Face Classification:
○ Extract features from image of the face of a person
○ Develop a network to use these features to classify the image into classes (people in our case)

2. Face Verification:
○ You can use the network developed earlier to do Face Verification. But how?
○ Identify the most important features which capture the unique identity of a person
○ These features form a fixed-length vector called the Embedding
○ In order to do verification, we only need to identify if 2 embeddings are similar using a metric like

cosine distance

Overall Workflow

Face Detection Feature Extraction (CNN)

Classifier (MLP)

Embedding

Softmax

Cosine distance

Class

Result

Difference between 2 problem statements

Both problems fundamentally differ in 1 key
aspect. Any guesses? It’s already on the slides.

Classification is closed set whereas
verification is open set.

Closed set meaning the test instances would
come from the same classes as the train and
validation data.

This may not be the case in verification as the
model should be able to ascertain if 2 faces
belong to the same person or not.

So, what changes?

ResNet

● Introduced in 2015, utilizes bottleneck architectures efficiently and learns them as
residual functions

● Easier to optimize and can gain accuracy from increased depth due to skip connections

https://arxiv.org/pdf/1512.03385.pdf

ResNet Architectures

34-Layer ResNet with Skip/Shortcut Connection (Top), 34-Layer Plain Network (Middle), 19-Layer VGG-19 (Bottom)

Block 1: Convolution

We are replicating the simplified operation for every layer on the paper

We can see how we have the [3 x 3, 64] x 3 times within the layer

Plain Network v.s. ResNet

Validation Error: 18-Layer and 34-Layer Plain Network (Left), 18-Layer and 34-Layer ResNet (Right)

Discriminative Features

● Classification optimizes learning separable features
● Optimally we wish to learn discriminative features

○ Maximum inter class distance

Center Loss

● Tries to minimize the intra class distance by adding a euclidean distance
loss term

● If you use this, YOU MUST USE CENTER LOSS FROM THE BEGINNING
OF TRAINING CLASSIFICATION!

Triplet Loss

● Minimizing first term → distance between Anchor and Positive image
● Maximizing second term → distance between Anchor and Negative

image

Training with Triplet Loss

Siamese Network

This network does not classify the images into certain categories or labels, rather it only

finds out the distance between any two given images.

Contrastive Loss
● Contrastive loss is a metric learning loss, which operates on the data points

produced by network and their positions relative to each other.
● The model can learn any features regardless of whether similar data points

would be located closely to each other or not after the transformation.
● Y term here specifies, whether the two given data points (X₁ and X₂) are

similar (Y=0) or dissimilar (Y=1)
● So Ls (loss for similar data points) is just Dw, distance between them, if two

data points are labeled as similar,
● we will minimize the euclidean distance between them. Ld, (loss for

dissimilar data points) on the other hand, needs some explanation. One
may think that for two dissimilar data points we just need to maximize
distance between them but with a margin

Understanding the Contrastive Loss Function

● In the first figure, we would naturally like to pull black
dots closer to the blue dots and push white dots farther
away from it.

○ Specifically, we would like to minimize the intra-class
distances (blue arrows) and maximize the inter-class
distances (red arrows)

● In the second figure, what we would like to achieve is to
make sure that for each class/group of similar points (in
case of Face Recognition task it would be all the photos
of the same person) the maximum intra-class distance is
smaller than the minimum

○ This means is that if we define some radius/margin
m, all the black dots should fall inside of this margin,
and all the white dots — outside

○ This way we would be able to use a nearest
neighbour algorithm for new data — if a new data
point lies within m distance from other, they are
similar/belong to same group/class. Inter-class
distance.

○ If Dw is ≥ m, the {m – Dw} expression is negative and
the whole right part of the loss function is thus 0 due
to max() operation — and the gradient is also 0, i.e.
we don’t force the dissimilar points farther away
than necessary.

Other types of Losses

● Pair-wise Loss (separate distributions of similarity scores)
● Angular Softmax Loss

References

● https://arxiv.org/pdf/1512.03385.pdf
● https://arxiv.org/pdf/1608.06993v3.pdf
● https://arxiv.org/pdf/1409.1556.pdf
● https://arxiv.org/pdf/1704.08063.pdf
● https://arxiv.org/pdf/1503.03832v3.pdf
● http://ydwen.github.io/papers/WenECCV16.pdf
● https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf
● https://towardsdatascience.com/densenet-2810936aeebb
● http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
● https://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional-neur

al-networks.pdf

https://arxiv.org/pdf/1512.03385.pdf
https://arxiv.org/pdf/1608.06993v3.pdf
https://arxiv.org/pdf/1409.1556.pdf
https://arxiv.org/pdf/1704.08063.pdf
https://arxiv.org/pdf/1503.03832v3.pdf
http://ydwen.github.io/papers/WenECCV16.pdf
https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf
https://towardsdatascience.com/densenet-2810936aeebb
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
https://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional-neural-networks.pdf
https://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional-neural-networks.pdf