CIFAR2, not CIFAR10
Your task is a binary classification problem. While the CIFAR10 dataset has 10 possible classes (airplane, automobile, bird, cat, deer, frog, horse, ship, and truck), you will build a CNN to take in an image and correctly predict its class to either be a cat or dog, hence CIFAR2. We limit this assignment to a binary classification problem so that you can train the model in a reasonable amount of time.
The assignment has 2 parts.
Our stencil provides a model class with several methods and hyperparameters you need to use for your network. You will also fill out a function that performs the convolution operator. You will also answer a questions related to the assignment and class material as part of this assignment.
Part 1: The Model
Roadmap
You will notice that the structure of the Model class is very similar to the Model class defined in your first assignment. We strongly suggest that you first complete the Intro to TensorFlow Lab before starting this assignment. The lab includes many explanations about the way a Model class is structured, what variables are, and how things work in TensorFlow. If you come into hours with questions about TensorFlow related material that is covered in the lab, we will direct you to the lab.
Below is a brief outline of some things you should do. We expect you to fill in some of the missing gaps (review lecture slides to understand the pipeline) as this is your second assignment.
Step 1. Preprocess the data
• We have provided you with a function unpickle(file) in the preprocess file stencil, which unpickles an object and returns a dictionary. Do not edit it. We have already extracted the inputs and the labels from the pickled file into a dictionary for you, as you can see within get_data.
• You will want to limit the inputs and labels returned by get_data to those representing the first and second classes of your choice. For every image and its corresponding label, if the label is not of the first or second class, then remove the image and label from your inputs and labels arrays.
• At this point, your inputs are still two dimensional. You will want to reshape your inputs into (num_examples, 3, 32, 32) using np.reshape(inputs, (-1, 3, 32 ,32)) and then transpose them so that the final inputs you return have shape (num_examples, 32, 32, 3).
• Recall that the label of your first class might be something like 5, representing a dog in the CIFAR dataset, but you will want to turn that to a 0 since this is binary classification problem. Likewise, for all images of the second class, say a cat, you will want to turn those labels to a 1.
• After doing that, you will want to turn your 0s and 1s to one hot vectors, where the index with a 1 represents the class of the correct image. You can do this with the function tf.one_hot(labels, depth=2).
• This is be a bit confusing so we’ll just make it clear: your labels should be of size (num_images, num_classes). So for the first example, the corresponding label might be [0, 1] where a 1 in the second index means that it’s a cat/dog/hamster/sushi.
Step 2. Create your model
• You will not receive credit if you use the tf.keras, tf.layers, and tf.slim libraries. You can use tf.keras for your optimizer but do NOT use Keras layers!
• Again, you should initialize all hyperparameters within the constructor even though this is not customary. This is still necessary for the autograder. Consider what’s being learned in a CNN and intialize those as trainable parameters. In the last assignment, it was our weights and biases. This time around, you will still want weights and biases, but there are other things that are being learned!
• We recommend using an Adam Optimizer [tf.keras.optimizers.Adam] with a learning rate of 1e-3, but feel free to experiment with whatever produces the best results.
• Weight variables should be initialized from a normal distribution (tf.random.truncated_normal) with a standard deviation of 0.1.
• You may use any permutation and number of convolution, pooling, and feed forward layers, as long as you use at least one convolution layer with strides of [1, 1, 1, 1], one pooling layer, dropout, and one fully connected layer.
• If you are having trouble getting started with model architecture, we have provided an example below:
• Convolution Layer 1 [tf.nn.conv2d]
• 16 filters of width 5 and height 5
• strides of 2 and 2
• same padding
• Batch Normalization 1 [tf.nn.batch_normalization]
• Get the mean and variance using [tf.nn.moments]
• ReLU Nonlinearlity 1 [tf.nn.relu]
• Max Pooling 1 [tf.nn.max_pool]
• kernels of width 3 and height 3
• strides of 2 and 2
• Convolution Layer 2
• 20 filters of width 5 and height 5
• strides of 1 and 1
• same padding
• Batch Normalization 2
• ReLU Nonlinearlity 2
• Max Pooling 2
• kernels of width 2 and height 2
• strides of 2 and 2
• Convolution Layer 3
• 20 filters of width 5 and height 5
• strides of 1 and 1
• same padding
• Batch Normalization 3
• ReLU Nonlinearlity 3
• Dense Layer 1
• Dropout with rate 0.3
• Dense Layer 2
• Dropout with rate 0.3
• Dense Layer 3
• Fill out the call function using the trainable variables you’ve created. Note that in the lab, we mentioned using a @tf.function decorator to tell TF to run it in graph execution. Do NOT do this for this assignment – we’ll explain why the forward pass has to be run in eager execution later. The parameter is_testing will be used in Part 2, do not worry about it when implementing everything in this part.
• Calculate the average softmax cross-entropy loss on the logits compared to the labels. We suggest using tf.nn.softmax_cross_entropy_with_logits.
Step 4. Train and test
• In the main function, you will want to get your train and test data, initialize your model, and train it for many epochs. We suggest training for 10 epochs. For the autograder, we will train it for at most 25 epochs (hard limit 10 of minutes). We have provided for you a train and test method to fill out. The train method will take in the model and do the forward and backward pass for a SINGLE epoch. Yes, this means that, unlike the first assignment, your mainfunction will have a for loop that goes through the number of epochs, calling train each time.
• Even though this is technically part of preprocessing, you should shuffle your inputs and labels when TRAINING. Keep in mind that they have to be shuffled in the same order. We suggest creating a range of indices of length num_examples, then using tf.random.shuffle(indices). Finally you can use tf.gather(train_inputs, indices) to shuffle your inputs. You can do the same with your labels to ensure they are shuffled the same way.
• You should also reshape the inputs into (batch_size, width, height, in_channels) before calling model.call(). When training, you might find it helpful to actually call tf.image.random_flip_left_right on your batch of image inputs to increase accuracy. Do not call this when testing.
• Call the model’s forward pass and calculate the loss within the scope of tf.GradientTape. Then use the model’s optimizer to apply the gradients to your model’s trainable variables outside of the GradientTape. If you’re unsure about this part, please refer to the lab. This is synonymous with doing the gradient_descent function in the first assignment, except that TensorFlow handles all of that for you!
• If you’d like, you can calculate the train accuracy to check that your model does not overfit the training set. If you get upwards of 80% accuracy on the training set but only 65% accuracy on the testing set, you might be overfitting.
• The test method will take in the same model, now with trained parameters, and return the accuracy given the test inputs and test labels.
• At the very end, we have written a method for you to visualize your results. The visualizer will not be graded but you can use it to check out your doggos and kittens.
• For fun, instead of passing in the indexes for dog and cats for your training and testing data, you can pass in other inputs and see how your model does when trying to classify something like bird vs. cat!
• Your README can just contain your accuracy and any bugs you have.
Mandatory Hyperparameters
You can train with any batch size but you are limited to training for at most 25 epochs (I know, the title of this section is a bit misleading). However, your model must train using TensorFlow functions and test using your own convolution function within 10 minutes on a department machine. We will be timing this when autograding. Again, the parameters we suggest are training for 10 epochs using a batch size of 64.
Reading in the Data
The CIFAR files are pickled objects. We have provided you with a function unpickle(filename). You should not edit it. Note: You should normalize the pixel values so that they range from 0 to 1 (This can easily be done by dividing each pixel value by 255) to avoid any numerical overflow issues.
Data format
The testing and training data files to be read in are in the following format:
train: A pickled object of 50,000 train images and labels. This includes images and labels of all 10 classes. After unpickling the file, the dictionary will have the following elements:
• data — a 50000×3072 numpy array of uint8s. Each row of the array stores a 32×32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
• labels — a list of 50000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.
• Note that if you download the dataset from online, the training data is actually divided into batches. We have done the job of repickling all of the batches into one single train file for your ease.
test: A pickled object of 10,000 test images and labels. This includes images and labels of all 10 classes. Unpickling the file gives a dictionary with the same key values as above.
We’ve already done the job of unpickling the file and have extracted the unprocessed inputs and labels in the get_datafunction.
To get only the images and labels of classes 3 and 5 (representing dog and cat), you will want to loop over the data and only add it to your result array of inputs and labels if they belong to those classes.
Visualizing Results
• We’ve provided the visualize_results(image_data, probabilities, image_labels, first_label, second_label) method for you to visualize your predictions against the true labels using matplotlib, a useful Python library for plotting graphs. This method is currently written with the image_labels having a shape of (num_images, num_classes). DO NOT EDIT THIS FUNCTION. You should call this function after training and testing, passing into into visualize_results an input of 10 images, 10 probabilities, 10 labels, the first label name, and second label name.
• Unlike the first assignment, you will need to pass in the strings of the first and second classes. A visualize_results method call might look like: visualize_results(image_inputs, probabilities, image_labels, “cat”, “dog”).
• This should result in a visual of 10 images with your predictions and the actual label written above so you can compare your results! You should do this after you are sure you have met the benchmark for test accuracy.
Part 2: Conv2d
Before starting this part of the assignment, you should ensure that you have an accuracy of at least 70% on the test set using only TensorFlow functions for the problem of classifying dogs and cats.
As a new addition to this assignment, you will be implementing your very own convolution function! Deep Learning == TensorFlow tutorial no more!
For the sake of simple math calculations (less is more, no?), we’ll require that our conv2d function only works with a stride of 1 (for both width and height). This is because the calculation for padding size changes as a result of the stride, which would be way more complex and unreasonable for a second assignment.
Do NOT change the parameters of the function we have provided. Even though the conv2d function takes in a strides argument, you should ALWAYS pass in [1, 1, 1, 1]. Leaving in strides as an argument was a conscious design choice – if you wanted to eventually make the function work for other kinds of strides in your own time, this would allow you to easily change it.
Roadmap
• Your inputs will have 4 dimensions. If we are to use this conv2d function for the first layer, the inputs would be [batch_size, in_height, in_width, input_channels].
• You should ensure that the input’s number of “in channels” is equivalent to the filters’ number of “in channels”. Make sure to add an assert statement or throw an error if the number of input in channels are not the same as the filters in channels. You will lose points if you do not do this.
• If padding is the same, you will have to determine a padding size. Luckily, for strides of 1, padding is just (filter_size – 1)/2. The derivation for this formula is out of the scope of this course, but if you are interested, you may read about it here.
• You can use this hefty NumPy function np.pad to padd your input! Note that for SAME padding, the way you pad may result in different output shapes for inputs with odd dimensions depending on the way you pad. This is ok. We will only test that your convolution function works similarly to TensorFlow’s using inputs with even (ie. divisible by 2) dimensions for SAME padding.
• After padding (if needed), you will want to go through the entire batch of images and perform the convolution operator on each image. There are two ways of going about this – you can continuously append to multi dimensional NumPy arrays to an output array or you can create a NumPy array with the correct output dimensions, and just update each element in the output as you perform the convolution operator. We suggest doing the latter – it’s conceptually easier to keep track of things this way.
• Your output dimension height is equal to (in_height + 2*padY – filter_height) / strideY + 1 and your output dimension width is equal to (in_width + 2*padX – filter_width) / strideX + 1. Refer to the slides if you’d like to understand this derivation.
• You will want to iterate the entire height and width including padding, stopping when you cannot fit a filter over the rest of the padding input. For convolution with many input channels, you will want to perform the convolution per input channel and sum those dot products together.
Testing out your own conv2d:
• We have provided for you a few tests that compare the result of your very own conv2d and TensorFlow’s conv2d. If you’ve implemented it correctly, the results should be very similar.
• The last super important part of this project is that you should call your conv2d function IN your model.TensorFlow cannot build a graph/differentiate with NumPy operators so you should not add a @tf.function decorator.
• In your model, you should set is_testing to True when testing, then make sure that if is_testing is True, you use your own convolution rather than TensorFlow’s conv2d on a SINGLE convolution layer. If you follow the architecture described above, we suggest adding in an if statement before the third convolution layer (ie. switch out the conv2d for your third convolution). This part will take the longest, and is why we say it might actually take up to 15 minutes on a local machine.
Autograder
Your model must complete training within 10 minutes AND/or under 25 epochs on a department machines.
Our autograder will import your model and your preprocessing functions. We will feed the result of your get_datafunction called on a path to our data and pass the result to your train method in order to return a fully trained model. After this, we will feed in your trained model, alongside the TA pre-processed data, to our custom test function. This will just batch the testing data using YOUR batch size and run it through your model’s call function. However, we will test that your model can test with any batch size, meaning that you should not harcode self.batch_size in your callfunction. The logits which are returned will then be fed through an accuracy function. When testing your own convolution function, we will only test on inputs with even dimensions for SAME padding. This is because you might result in different output dimensions that TensorFlow’s convolution function when using SAME padding on odd inputs. In order to ensure you don’t lose points, you need to make sure that you… A) correctly return training inputs and labels from get_data, B ) ensure that your model’s call function returns logits from the inputs specified, and that it does not break on different batch sizes when testing, C) make sure your own convolution function works, and D) no part of your code relies on any packages outside of TensorFlow, NumPy, MatplotLib, or the Python standard library.