Results and Evaluation
Resource and tools
The model training and evaluation is implemented using python with tensorflow
framework 1.0 on ubuntu linux system. I use Amazon Elastic Compute Cloud
(EC2) G2 instance which uses NVIDIA GRID K520 GPUs for my model training.
The image classification app on the mobile is implemented using Android java
with tensorflow mobile library. Currently the tensorflow mobile library support
3 platforms: Android, IOS and Raspberry Pi. The library provides APIs that
let mobile app easily load pre-trained model and do inference with it.
The android image classification app is developed with Android Studio which is
the official IDE for Android.
Checkpoint File
During training, we can use tensorflow API to save the learned model parameters
periodically to binary checkpoint files. In this way, the model parameters are
backed up. Next time, the model parameters can be restored by loading data
from checkpoint file.
Model File
The model file is in Protocol Buffers format which can be saved and loaded using
many different languages. So we can save the model file using python and load
the model using java in android app.
The Graph object contains all the information about the model graph. The
graph consists of nodes. Each node stores various information including node
name, operation such as “Add” and “Conv2D”, input nodes and other attributes
such as filter size for “Conv2D”.
To make it suitable for deployment, we can use tool from tensorflow
freeze_graph.py to combine the graph definition file and checkpoint file that
contains learned parameters into a single model file. The tool achieves this by
replacing Variable node with Const node that contains the parameters and it
also removes nodes unnecessary for inference to simplify graph and decreases file
size.
The resulting model file can then be shipped with Android app. In the android
app, upon starting, we will first load the model file using Tensorflow Mobile java
API. Then we can do inference using the loaded model.
1
Dataset
CIFAR 100
Figure 1:
The CIFAR-100 dataset contains 60000 small images of size 32×32. They belong
to 100 different classes with each class containing 600 images. A sample of 100
images of this dataset are shown in above figure.
Experimental Setup
Training set and test set
This CIFAR-100 dataset is divided into training set which contains 50000 images
and test set which contains 10000 images.
2
Preprocessing
During the training, an image is randomly transformed before feeding to the
neural networks. In this way, the neural networks will train on multiple versions
of the same image and the actual training data set size is much larger than
original data set size. This will make the model better generalize and reduce
overfitting.
Randomly Shift the Image
First pad the image, and then randomly crop the image. In this way, the image
will randomly shift in the 4 directions.
Randomly Flip the Image
The image is fliped left to right with 0.5 probability.
Randomly adjust the image brightness
This randomly add a value between -63 and 63 to all RGB components of every
pixel.
Randomly change the image contrast
Randomly choose a contrast factor 0.2 ≤ f ≤ 1.8. For each RBG channel,
compute the mean m and update the corresponding component of each pixel
with:
(x − m) × f + m
After above randomly changing steps of the image, lastly we normalize the image
data to make it have zero mean and unit norm.
Mobilenet
Hyperparameters
Batch Size: 128
Initial Learning rate 0.01, decay with factor 0.94 every 2 epochs.
Weigth decay parameter is set to 0.00004.
Use RMSProp optimization algorithm with decay rate of 0.9.
The initial weights are loaded from mobilenet pre-trained model on imagenet.
In the first stage, train only on the last fully connected layer and keeping the
parameters of previous layers unchanged. It trains 25000 steps in this phase.
3
Then train all layers to fine-tune the model. It trains 55000 steps in this phase.
During training, random minor changes are applied on the images to augment
the data set.
After training finishes, use the test set to evaluate the performance. Note that
the prediction on each image is just done once. If using average prediction of
multiple changes on a image is used, the performance is likely to improve.
The models are exported to tensorflow model file. In the android mobile image
classification app, the model file is loaded and the inference time is computed
by dividing the time it takes to classify 100 images one by one with 100. The
inference time on mobile is done on Nexsus 6 Android phone.
The experiments are done for width multiplier 1.0, 0.75, 0.5 and 0.25, image size
32, 24 and 16. So above steps are done for a total of 12 models.
The change of losses with training steps for model with width multiplier 1.0 and
image size 32 are as follows. Others are similar. The red line is for first stage
and the green line for the second stage.
Total Loss
Figure 2:
4
Cross Entropy Loss
Figure 3:
Regularization Loss
InceptionV3
Scale the image from 32×32 to 128×128. The first stage trains 15000 steps with
fixed learning rate 0.01.
The second stage trains 30000 steps with smaller fixed learning rate 0.0001.
Both stages uses weight decay 0.00004.
Total Loss
Cross Entropy Loss
Regularization Loss
5
Figure 4:
Figure 5:
6
Figure 6:
Figure 7:
7
Resnet
Same process with InceptionV3.
Total Loss
Figure 8:
Cross Entropy Loss
Regularization Loss
Metrics
Top-1 Accuracy
The ratio between the number of images that are predicted correctly and the
total number of images in the test set.
Top-5 Accuracy
Same with top-1 Accuracy, it is the ratio between the number of correct predic-
tions and the total number of images. The difference is the meaning of correct
8
Figure 9:
Figure 10:
9
prediction. For top-5 accuracy, classifier gives 5 candidate guesses instead of 1
guess. If the correct label is one of the 5 guesses, then the prediction is considered
correct.
Inference Time
The average time model takes to classify a single image.
Model File Size
The size of the model file in tensorflow for deployment. The model file size is
mainly determined by the number of parameters and the number of bits used to
encode each parameter.
Results
Width
multiplier
Resolution
Multiplier
image
size
Top-
1
Accuracy
Top-
5
Accuracy
Inference
Time
(ms)
Model
File
Size
(bytes)
1.0 1.0 32 0.61570.86749.178 13420373
1.0 0.75 24 0.56580.83347.408 13420373
1.0 0.5 16 0.44180.73362.842 13420373
0.75 1.0 32 0.58410.84924.266 7734332
0.75 0.75 24 0.53450.81613.664 7734332
0.75 0.5 16 0.41470.71162.045 7734332
0.5 1.0 32 0.54950.82442.629 3618162
0.5 0.75 24 0.50050.78562.179 3618162
0.5 0.5 16 0.37150.67531.055 3618162
0.25 1.0 32 0.441 0.74031.286 1071719
0.25 0.75 24 0.38430.68631.072 1071719
0.25 0.5 16 0.28730.57390.581 1071719
10
Model
Name
Top-
1
Accuracy
Top-
5
Accuracy
Inference
Time
(ms)
Model
File
Size
(bytes)
Mobilenet
(Width
mul-
ti-
plizer
1
and
Res-
o-
lu-
tion
Mul-
ti-
plier
1)
0.61570.86749.178 13420373
Inception
V3
0.75370.9485168.3788361937
Resnet0.61620.8733162.3595276899
Analysis
Comparison
Top-
1
Ac-
cu-
racy
Loss
Top-
5
Ac-
cu-
racy
Loss
Inference
Speedup
Model
Com-
pres-
sion
Ratio
Mobilenet
vs
Inception
18.3%8.5% 18.3 6.6
Mobilenet
vs
Resnet
0.08%0.67%17.7 7.1
We can see that the Mobilenet have significant inference speed up and model size
compression over Inception and ResNet. Its accuracy is similar with ResNet
and have a relatively big loss compared with Inception.
11
We can also see that smaller width multiplizer will decrease inference time, model
size and accuracy. Smaller resolution multiplier will not affect model size and
will decrease inference time and accuracy. Because smaller width multiplizer
will decrease the number of channels used in the filters which will decrease the
number of parameters, so the model file decreases. Smaller resolution multiplier
will decrease the input image size, so the amount of computation decrease, but
the number of parameters are the same. Thus it will speed up inference but not
shrink model file size.
The results also show that it is better to decrease width multiplizer than resolution
multiplizer to speed up inference and shrink model file. For example, using
width multiplier 0.75 and resolution multiplier 1.0 have higher accuracy, quicker
inference and smaller model size than using width multiplier 1.0 and resolution
multiplier 0.75.
12
Results and Evaluation
Resource and tools
Checkpoint File
Model File
Dataset
CIFAR 100
Experimental Setup
Training set and test set
Preprocessing
Mobilenet
InceptionV3
Resnet
Metrics
Top-1 Accuracy
Top-5 Accuracy
Inference Time
Model File Size
Results
Analysis