CS 499/599 HW2
Homework 2: Adversarial Attacks on Your Models
Homework Overview
The learning objective of this homework is for you to attack your models built in Homework 1 with white-box adversarial examples. You will also use adversarial training to build your robust models. We then analyze the impact of several factors—that you can control as an attacker or a defender—on the success rate of attack (or defense). You can start this homework from the codebase you wrote for Homework 1.
Copyright By PowCoder代写 加微信 powcoder
Initial Setup
Datasets and DNN Models
We will keep using the two datasets: MNIST [link] and CIFAR-10 [link]. But, we only focus on two DNN models: LeNet [link] and ResNet18 [link].
Recommended Code Structure
You will write two scripts adv_attack.py and adv_train.py. The rest are the same as Homework 1.
– [New] adv_attack.py: a Python script to run adversarial attacks on a pre-trained model.
– [New] adv_train.py: a Python script for adversarial-training a model.
You may find off-the-shelf libraries, e.g., adversarial-robustness-toolbox [link], where you can plug-n-play attacks on your models. I do NOT recommend using any of those libraries for this homework. However, it is allowed to refer to the community implementations of attacks and defenses and re-write them in your hands. Remember: the important learning objective is to understand the attack internals and implement them.
Task I: Attack Your Models
Let’s start with attacking your DNN models trained in Homework 1. We will attack your 2 DNNs: LeNet on MNIST and ResNet18 on CIFAR10. You need to use PGD [Madry et al.] as an adversarial example-crafting algorithm. Your job is to craft the PGD adversarial examples for all the test-time samples (i.e., 10k test-set samples for both MNIST and CIFAR10). To measure the effectiveness of your attacks, we will compute the classification accuracy on these adversarial examples. Make sure you attack the same DNNs that you used for crafting adversarial examples.
Here, you need to implement the following function in adv_attack.py.
def PGD(x, y, model, loss, niter, epsilon, stepsize, randinit, …)
– x: a clean sample
– y: the label of x
– model: a pre-trained DNN you’re attacking
– loss: a loss you will use
– [PGD params.] niter: # of iterations
– [PGD params.] epsilon: l-inf epsilon bound
– [PGD params.] stepsize: the step-size for PGD
– [PGD params.] randinit: start from a random perturbation if set true
// You can add more arguments if required
This PGD function crafts the adversarial example for a sample (x, y) [or a batch of samples]. It takes (x, y), a pre-trained DNN, and attack parameters; and returns the adversarial example(s) (x’, y). Note that you can add more arguments to this function if required. Please use the following attack hyper-parameters as a default:
epsilon: 0.3 (MNIST) and 0.03 (CIFAR10)
stepsize: 2/255.
randinit: true
To measure the effectiveness of the adversarial examples, we will write an evaluation script in if __name__ == “__main__”: in the same file. Here, for all the 10k adversarial examples crafted, you will compute the classification accuracy on the DNN model you used. Note that you will observe much less accuracy than what you can observe on the clean test-time samples.
Task II: Analyze the Impact of Several Factors on Your Attack’s Success Rate
Now, let’s turn our attention to several factors that can increase/decrease the effectiveness of your white-box attacks. In particular, we will vary: (1) the attack hyper-parameters (e.g., the number of iterations) and (2) the way we trained our DNN models (see Task II of Homework 1).
Subtask II-1: Analyze the Impact of Attack Hyper-parameters
We will focus on two attack hyper-parameters: niter and epsilon. Use the 2 DNNs in Task I (LeNet on MNIST and ResNet18 on CIFAR10).
(1) Set the number of iterations in {1, 2, 3, 4, 5, 10, 20, 30, 40, 80, 100}.
(2) Fix the iterations to 5, and set the epsilon to {0.01 0.02 0.03 0.04 0.05 0.1 0.2 0.3 0.4 0.5 1.0}.
Please use those different hyper-parameters and compute the classification accuracy of 2 DNN models on your adversarial examples. Draw plots: { # iterations } vs. { classification accuracy } and { epsilon } vs. { classification accuracy } and explain your intuitions on why you observe them.
Subtask II-2: Analyze the Impact of the Training Techniques You Use
One may think we can use some nice training techniques for reducing the effectiveness of white-box adversarial attacks. We plan to run some experiments to evaluate this claim. In particular, we’re interested in the following three techniques: data augmentations and regularizations.
(1) Data augmentations: In Task II of Homework 1, we examine two simple augmentations: rotation and horizontal flips. We also have DNNs trained with/without those augmentations. On the 2 DNNs (LeNet on MNIST and ResNet18 on CIFAR10) trained with/without each data augmentation, craft adversarial examples on the test-set samples and measure the classification accuracy on them.
(2) Regularizations: We also examine two techniques: Dropout [link] and weight decay. Let’s focus only on ResNet18 in CIFAR10.
1) To examine the impact of Dropout, we need to modify the ResNet18’s network architecture. Add the Dropout layer before its penultimate layer and set the rate to 0.5. Train this modified ResNet18 (henceforth called ResNet18-Dropout). Craft adversarial examples on this model, measure the classification accuracy, and compare the accuracy to what we have with ResNet18 (w/o Dropout).
2) To examine the impact of weight decay, we will train ResNet18 with Adam optimizer [link] on CIFAR10. You will train 5 ResNet18 models trained with different weight decay values: {1e-5, 1e-4, 1e-3, 1e-2, 1e-1}. Please don’t be surprised when you see bad accuracy with higher weight decay. Craft adversarial examples on those five DNN models and measure the accuracy on both the clean samples and adversarial examples. Compare how much accuracy you can decrease on each model.
You may (or may not) find that each technique increases/decreases the accuracy degradation caused by adversarial examples. Please write down the accuracy degradations and explain your intuitions on why you observe them in your report.
Task III: Defend Your Models with Adversarial Training
One way to mitigate adversarial attacks is to train your models with adversarial training (AT). Here, we will examine the effectiveness of AT.
Let’s implement a script for AT. Make a copy of your train.py and name it adv_train.py. We will convert the normal training process into adversarial training. In train.py, we train a model on a batch of clean training samples (in each batch). Instead, you need to make adversarial examples on the batch of clean samples and train your models on them. Note that this is slightly different from the work by Goodfellow et al..
Please train 2 DNN models (LeNet and ResNet18) adversarially on MNIST and CIFAR10, respectively. Once you train those robust models, you require to craft adversarial examples and compute the accuracy. Note that we use the same attack hyperparameters as in Task I. Compare:
(1) How’s your robust models’ accuracy on adversarial examples compared to your undefended models?
(2) How’s your robust models’ accuracy on clean test-set examples compared to your undefended models?
(3) Let’s increase the PGD attack iterations from 5 to 7. How’s your robust models’ accuracy changes?
Please explain your intuitions on why you observe them in your report.
[Extra +3 pts]: Use Your Adversarial Examples to Attack Real-world DNNs
You may be curious how much the adversarial examples that you crafted will be effective against the DNNs deployed in the real-world. Here are some real-world image classification demos [a list of demos]. Please store 10 adversarial examples for each MNIST and CIFAR10 attack (Task I) to .png files. Upload them on one of the image classification demos and see how the predicted labels are different compared to your DNNs.
Please show your adversarial examples, the classification of them on your DNNs, and the predicted labels on the demo you chose in your report.
Submission Instructions
Use Canvas to submit your homework. You need to make a single compressed file (.tar.gz) that contains your code and a write-up as a PDF file. Put your write-up under the reports folder. Your PDF write-up should contain the following things:
The classification accuracy of clean test-set samples on 2 DNNs (LeNet and ResNet18).
The classification accuracy of your adversarial examples on 2 DNNs.
Your analysis: write-down 2-3 sentences explaining why you see those results.
Subtask II-I
Your 4 plots: { # iterations } vs. { classification accuacy } and { epsion } vs. { classification accuracy } on each of your 2 DNNs.
Your analysis: Provide 2-3 sentences for each case explaining why you observe the result.
Subtask II-II
Your analysis: Provide 2-3 sentences for each case explaining why you observe the result.
The classification accuracy of clean test-set samples on your robust DNNs.
The classification accuracy of your adversarial examples on your robust DNNs.
Your analysis: write-down 2-3 sentences for the three questions above.
[Extra +3 pts]
Your adversarial examples shown as images.
Their classification results on your DNN models.
Their classification results on the real-world DNNs.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com