程序代写 python deep learning algorithm Week 9: Adversarial Machine Learning – Vulnerabilities (Part I)

Week 9: Adversarial Machine Learning – Vulnerabilities (Part I)
COMP90073 Security Analytics
, CIS Semester 2, 2021

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities Definition + examples
Classification
Evasion attacks
– –
• • •
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities Definition + examples
Classification
Evasion attacks
– –
• • •
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis

Definition
What is Adversarial Machine Learning (AML)?
“Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input.” – Wikipedia
COMP90073 Security Analysis

Examples
• Test-time attack

Image classifier C: input images X{0, 1, 2, …, 9}, 0 +𝛿𝛿1
100×100 Input vector
0 255
P0 0.01 0.01
P1 0.02 0.02
P2 0.01 0.01
0.01 0.01
. 0.02 0.02
. 0.01 0.01
. 0.01 0.01
0.03 0.03
0.03 0.56 P9 0.85 0.32
Output vector
Height 100px
Width 100px
…𝑖𝑖 255 +𝛿𝛿𝑗𝑗
Classifier
Adversarial sample
…0 + 𝛿𝛿 0
COMP90073 Security Analysis

Examples
• Test-time attack Original
images + Perturbation
Adversarial samples
Classifier output
6248960254
COMP90073 Security Analysis
=

Examples
• Training-time attack
– Insert extra training points to maximise the loss
Class 1 (+)
Class 2 (−)
COMP90073 Security Analysis

Examples
• Hugeamountofattention – Mission-criticaltasks
COMP90073 Security Analysis

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities
Definition + examples
Classification
Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
– –
• • •
COMP90073 Security Analysis

Classification
• Classification [1]

– Exploratory/evasion: test time
– Causative/poisoning: training time
– Integrityvs.Availability–securityviolation
– Integrity: harmful instances to pass filters
– Availability: denial of service, benign instances to be filtered
– Targetedvs.Indiscriminate/Untargeted–specificity
– Targeted: misclassified as a specific class
– Indiscriminate/untargeted: misclassified as any other class

– White-box: full knowledge of the victim model
– Black-box: no/minimum knowledge of the model
Exploratory vs. Causative – influence
White-box vs. Black-box – attacker information
COMP90073 Security Analysis

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities
– –
• • •
Definition + examples Classification Evasion attacks
Gradient-descent based approaches
Automatic differentiation
Real-world example Poisoning attacks Transferability
COMP90073 Security Analysis

Evasion attacks (definition)
• Evasion attack
– Aim: minimum perturbation δ to the input 𝑥𝑥, in order to cause
model C to misclassify
𝑥𝑥→𝑥𝑥+𝛿𝛿(𝑥𝑥,𝛿𝛿∈ 0,1 )
such that (s.t.)
𝐶𝐶 𝑥𝑥 + 𝛿𝛿 ≠ 𝐶𝐶 𝑥𝑥 Indiscriminate
OR 𝐶𝐶 𝑥𝑥 + 𝛿𝛿 = 𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 Targeted
𝑑𝑑
COMP90073 Security Analysis

Evasion attacks (definition)
• Evasion attack
– Formulated as an optimisation problem
arg min 𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
s.t.𝐶𝐶𝑥𝑥+𝛿𝛿 ≠𝐶𝐶𝑥𝑥 OR 𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
(1)
– p-norm: 𝛿𝛿 = ∑𝑑𝑑 |𝛿𝛿|𝑝𝑝 1/𝑝𝑝 𝑝𝑝 𝑖𝑖=1𝑖𝑖
Highly non-linear
– 𝛿𝛿 =∑𝑑𝑑 |𝛿𝛿|, 𝛿𝛿 = ∑𝑑𝑑 𝛿𝛿 2, 𝛿𝛿 =max𝛿𝛿 1 𝑖𝑖=1 𝑖𝑖 2 𝑖𝑖=1 𝑖𝑖 ∞ 𝑖𝑖 𝑖𝑖
•E.g.,𝛿𝛿=1,2,3,−4,𝛿𝛿1=10,𝛿𝛿2= 30,𝛿𝛿∞=4
COMP90073 Security Analysis

Evasion attacks (definition)
Transform (1) to the following problem [2]: argmin 𝛿𝛿−𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
Indiscriminate Targeted
𝛿𝛿∈ 0,1 𝑑𝑑
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
Objective function f: how close the prediction and the target are, e.g., the cross entropy loss function
COMP90073 Security Analysis

Evasion attacks (definition)
• Indiscriminate attack: arg min 𝛿𝛿 − 𝑐𝑐 � 𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥 + 𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
– Prediction as different from the truth as possible Misclassified as any class other than “9”
𝑥𝑥
0 0 0 0 0 0 0 0 0 1
Ground truth (One-hot vector)
𝑥𝑥+𝛿𝛿
Classifier
P0 0.01 P1 0.02 P2 0.01
0.01 . 0.02 . 0.01 . 0.01 0.03 0.56 P9 0.32
Prediction
Classifier
COMP90073 Security Analysis

Evasion attacks (definition)
• Targeted attack: arg min 𝛿𝛿 + 𝑐𝑐 � 𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥 + 𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
– Prediction as close to the target as possible
𝑥𝑥 𝑥𝑥+𝛿𝛿
Misclassified as “8” (target class)
P0 0.01 0 P1 0.02 0 P2 0.01 0
0.01 0 .0.02 0 .0.01 0 .0.01 0
0.03 0
0.56 1 P9 0.32 0
Classifier
Prediction
Target (One-hot vector)
COMP90073 Security Analysis

Evasion attacks (definition)
Transform (1) to the following problem [2]: argmin 𝛿𝛿−𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 Targeted
How to find the minimum perturbation 𝛿𝛿?
COMP90073 Security Analysis

Evasion attacks (gradient descent)
Gradient descent
• Gradient: a vector that points in the direction of greatest increase of a function
https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html
COMP90073 Security Analysis

Evasion attacks (gradient descent)
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate Targeted
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
𝑥𝑥 0
• Start with the initial input 𝑥𝑥0 J •Repeat𝑥𝑥𝑖𝑖←𝑥𝑥𝑖𝑖−1−𝛼𝛼 𝜕𝜕𝐽𝐽 ,𝑖𝑖>0
𝑥𝑥1,2,…
𝜕𝜕𝑥𝑥𝑖𝑖−1
• Until(1)𝐶𝐶 𝑥𝑥𝑖𝑖 ≠𝐶𝐶 𝑥𝑥0 (𝑜𝑜𝑜𝑜𝐶𝐶 𝑥𝑥𝑖𝑖 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡),or
(2)𝛿𝛿=𝑥𝑥𝑖𝑖−𝑥𝑥0 >𝜖𝜖,or (3) 𝑖𝑖 ≥ 𝑖𝑖𝑚𝑚𝑡𝑡𝑥𝑥,or
(4)𝐽𝐽𝑥𝑥𝑖𝑖 −𝐽𝐽(𝑥𝑥𝑖𝑖−1)≤∆
 success failure
COMP90073 Security Analysis

Evasion attacks (gradient descent)
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate Targeted
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
J
•…
•Repeat𝑥𝑥𝑖𝑖←𝑥𝑥𝑖𝑖−1−𝛼𝛼 𝜕𝜕𝐽𝐽 ,𝑖𝑖>0
• … 𝜕𝜕𝑥𝑥𝑖𝑖−1
How to design the objective function f ?
COMP90073 Security Analysis

Evasion attacks (FGSM)
• Fast gradient sign method (FGSM) [3]:
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿 argmin−𝑙𝑙𝑜𝑜𝑙𝑙𝑙𝑙 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝛿𝛿∈ 0,1 𝑑𝑑 entropy loss 𝛿𝛿∈ 0,1 𝑑𝑑
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 f=cross argmin𝑙𝑙𝑜𝑜𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 • Single step 𝜖𝜖: fast rather than optimal
𝑥𝑥′ ←𝑥𝑥+𝜖𝜖�𝑙𝑙𝑖𝑖𝑠𝑠𝑠𝑠
𝜕𝜕𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝑥𝑥
OR 𝑥𝑥′ ← 𝑥𝑥 − 𝜖𝜖 � 𝑙𝑙𝑖𝑖𝑠𝑠𝑠𝑠
• Not meant to produce the minimal adversarial perturbations
𝜕𝜕𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝑥𝑥
COMP90073 Security Analysis

Evasion attacks (Iterative Gradient Sign)
• Iterative gradient sign [4]
– Single step 𝜖𝜖multiple smaller steps 𝛼𝛼
– 𝑥𝑥𝑖𝑖←clip𝜖𝜖 𝑥𝑥𝑖𝑖−1+𝛼𝛼�sign𝜕𝜕𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 OR 𝑖𝑖 𝜖𝜖 𝑖𝑖−1 𝜕𝜕𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
– 𝑥𝑥 ←clip 𝑥𝑥 −𝛼𝛼�sign 𝜕𝜕𝑥𝑥𝑖𝑖−1 𝜕𝜕𝑥𝑥𝑖𝑖−1
– clip𝜖𝜖: make sure that 𝑥𝑥𝑖𝑖𝑗𝑗 is within the range of 𝑥𝑥0𝑗𝑗 − 𝜖𝜖, 𝑥𝑥0𝑗𝑗 + 𝜖𝜖  projection
x0
ε
COMP90073 Security Analysis

Evasion attacks (Momentum Iterative FGSM)
– 𝑠𝑠 =𝜇𝜇�𝑠𝑠 +
𝑖𝑖 𝑖𝑖−1 ∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 𝑖𝑖 𝑖𝑖−1 𝑖𝑖
• Momentum iterative fast gradient sign method
∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 1
, 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼�sign(𝑠𝑠)
– Momentum overcome two problems of vanilla gradient descent • Get stuck in local minima
• Oscillation
https://medium.com/analytics-vidhya/momentum-rmsprop- https://eloquentarduino.github.io/2020/04/stochastic- and-adam-optimizer-5769721b4b19 gradient-descent-on-your-microcontroller/
COMP90073 Security Analysis

Evasion attacks (C&W)
C & W attack [2]
argmin 𝛿𝛿+𝑐𝑐�𝑓𝑓𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ifandonlyif 𝑓𝑓 𝑥𝑥+𝛿𝛿 ≤0
𝐶𝐶 𝑥𝑥 + 𝛿𝛿 ≠ 𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ⇔ 𝑓𝑓 𝑥𝑥 + 𝛿𝛿 > 0
Consistent with the definition of function f: how close the prediction and the target are
COMP90073 Security Analysis

Evasion attacks (C&W)
𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ifandonlyif 𝑓𝑓 𝑥𝑥+𝛿𝛿 =𝑓𝑓(𝑥𝑥′)≤0 •Option1:𝑓𝑓𝑥𝑥′ =𝑚𝑚𝑚𝑚𝑥𝑥max𝐹𝐹(𝑥𝑥′)𝑖𝑖−𝐹𝐹𝑥𝑥′𝑡𝑡,0
𝑖𝑖≠𝑡𝑡
• Option 2: 𝑓𝑓 𝑥𝑥′ = log(1 + exp(max 𝐹𝐹(𝑥𝑥′)𝑖𝑖 − 𝐹𝐹 𝑥𝑥′ 𝑡𝑡)) − log(2)
𝑖𝑖≠𝑡𝑡
• Option3:𝑓𝑓 𝑥𝑥′ =𝑚𝑚𝑚𝑚𝑥𝑥 0.5−𝐹𝐹 𝑥𝑥′ 𝑡𝑡, 0
F(𝑥𝑥): output vector for 𝑥𝑥, i.e., probabilities of the input 𝑥𝑥 belonging to each class. For
example:
F(x)0 0.01
F(x) 0.02
F(x)1 0.01 F(𝑥𝑥)
Classifier
2 0.01 . 0.02 . 0.01 . 0.01
0.03
0.03 F(x)9 0.85
COMP90073 Security Analysis

CleverHans
• CleverHans
• Donotusethelatestversion
• Downloadfrom:https://github.com/tensorflow/cleverhans/releases/tag/v.3.0.1 • Prerequisite:
– Python3(https://www.python.org/downloads/)
– Tensorflow(https://www.tensorflow.org/install/)
– Python3.5/3.6/3.7andTensorFlow{1.8,1.12,1.14}
• Installation:
– cdcleverhans
– pipinstall-e.
COMP90073 Security Analysis

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities
– –
• • •
Definition + examples Classification Evasion attacks
Gradient-descent based approaches
Automatic differentiation
Real-world example Poisoning attacks Transferability
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate Targeted
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
• Start with the initial input 𝑥𝑥0
J
•Repeat𝑥𝑥𝑖𝑖←𝑥𝑥𝑖𝑖−1−𝛼𝛼 𝜕𝜕𝐽𝐽 ,𝑖𝑖>0 𝜕𝜕𝑥𝑥𝑖𝑖−1
• Until𝐶𝐶 𝑥𝑥𝑖𝑖 ≠𝐶𝐶 𝑥𝑥0 (𝑜𝑜𝑜𝑜𝐶𝐶 𝑥𝑥𝑖𝑖 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡) How to calculate the partial derivatives?
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
Derivative ′
• Definition: 𝑓𝑓 𝑥𝑥 = lim
h 2h
𝑓𝑓 𝑥𝑥+h −𝑓𝑓(𝑥𝑥) h→0 h
𝑓𝑓 𝑥𝑥+h −𝑓𝑓(𝑥𝑥) 𝑓𝑓 𝑥𝑥+h −𝑓𝑓(𝑥𝑥−h) –,
• Numerical differentiation
– Significant round-off errors
• Symbolic differentiation: apply chain rules to symbolic expressions – Exponentially-long results
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
Automatic differentiation
• A set of techniques to numerically evaluate the derivative of a function specified by a computer program – Wikipedia
• Any complicated function 𝑓𝑓 can be rewritten as the composition of a sequence of primitive functions:
𝑓𝑓=𝑓𝑓 ∘𝑓𝑓 ∘𝑓𝑓 ∘⋯∘𝑓𝑓
• Apply the chain rule
• Forward mode: 𝜕𝜕𝑓𝑓 = 𝜕𝜕𝑓𝑓0 𝜕𝜕𝑓𝑓1 … 𝜕𝜕𝑓𝑓𝑛𝑛−1 𝜕𝜕𝑓𝑓𝑛𝑛
012 𝑛𝑛
𝜕𝜕𝑥𝑥 𝜕𝜕𝑓𝑓1 𝜕𝜕𝑓𝑓2 𝜕𝜕𝑓𝑓𝑛𝑛 𝜕𝜕𝑥𝑥
• Reverse mode: 𝜕𝜕𝑓𝑓 = 𝜕𝜕𝑓𝑓0 𝜕𝜕𝑓𝑓1 … 𝜕𝜕𝑓𝑓𝑛𝑛−1 𝜕𝜕𝑓𝑓𝑛𝑛 𝜕𝜕𝑥𝑥 𝜕𝜕𝑓𝑓1 𝜕𝜕𝑓𝑓2 𝜕𝜕𝑓𝑓𝑛𝑛 𝜕𝜕𝑥𝑥
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
• Given𝑦𝑦=𝑓𝑓 𝑥𝑥1,𝑥𝑥2 =ln 𝑥𝑥1 +𝑥𝑥1𝑥𝑥2 −sin 𝑥𝑥2 ,calculate 𝜕𝜕𝜕𝜕 at(2,5) • Forward mode [5] 𝜕𝜕𝑥𝑥1
y = v
− v5 = v4 − v3
𝑦𝑦 = 𝑣𝑣 , 𝑖𝑖 = 𝑚𝑚 − 1, … , 0 𝑚𝑚−𝑖𝑖 𝑙𝑙−𝑖𝑖
5
Output variables:
𝑣𝑣𝑖𝑖,𝑖𝑖 = 1,…,𝑙𝑙
+
v4 =v1 +v2
v2 = v-1 × v0
Working variables:
v = ln(v ) ln × 1 -1
v-1 = x1 x1
sin v = sin(v ) 3 0
v0 = x2
𝑣𝑣 = 𝑥𝑥 , 𝑖𝑖 = 1, … , 𝑠𝑠 𝑖𝑖−𝑛𝑛 𝑖𝑖
Input variables:
Computational graph
x2
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
v1 = ln(v-1)
v-1 = x1
× x1
sin v3 = sin(v0) x2 v0 = x2
ln
+
y = v5
− v5 = v4 − v3
v4 =v1 +v2
v2 = v-1 × v0
𝑣𝑣𝑖𝑖̇ = 𝜕𝜕𝑣𝑣1 𝜕𝜕𝑥𝑥𝑖𝑖
01
0×5 + 2×1
cos5×1 0 +2
2-cos5 2-cos5
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
• Reverse mode [5] Computational graph
y = v5
− v5 = v4 − v3
v4 =v1 +v2
v2 = v-1 × v0
+
v1 = ln(v-1) ln × v-1=x1 x1
sin v3 = sin(v0)
x2
v0=x2 𝑣𝑣�𝑖𝑖=𝜕𝜕𝜕𝜕𝑗𝑗𝑖𝑖 𝜕𝜕𝑣𝑣
Adjoint
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
• Example1:𝑦𝑦=ln 𝑥𝑥1 +𝑥𝑥1𝑥𝑥2 −sin 𝑥𝑥2 • Calculate 𝜕𝜕𝜕𝜕 , 𝜕𝜕𝜕𝜕
𝜕𝜕𝑥𝑥1 𝜕𝜕𝑥𝑥2
– Forward mode: __ time(s) – Reverse mode: __ time(s)
• Example2:𝑦𝑦1 =ln 𝑥𝑥 +𝑥𝑥,𝑦𝑦2=𝑥𝑥−sin 𝑥𝑥 • Calculate 𝜕𝜕𝜕𝜕1 , 𝜕𝜕𝜕𝜕2
– Forward mode: __ time(s) – Reverse mode: __ time(s)
𝜕𝜕𝑥𝑥 𝜕𝜕𝑥𝑥
COMP90073 Security Analysis

Evasion attacks (automatic differentiation)
Function 𝑓𝑓: 𝑅𝑅𝑛𝑛 → 𝑅𝑅𝑚𝑚
• n independent x as inputs, m dependent y as outputs
• Reverse mode: 𝑠𝑠 ≫ 𝑚𝑚 (one reverse run can calculate 𝜕𝜕𝜕𝜕𝑗𝑗)
• Forward mode: 𝑚𝑚 ≫ 𝑠𝑠 (one forward run can calculate 𝜕𝜕𝜕𝜕 ) i j𝜕𝜕𝑥𝑥𝑖𝑖
Tensorflow example
import tensorflow as tf
x = tf.Variable(1.)
y = tf.Variable(2.)
z = tf.subtract(2*x, y)
sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(grad)) # [2.0, -1.0]
#http://laid.delanover.com/gradients-in-tensorflow/
grad = tf.gradients(z, [x, y])
𝜕𝜕𝑥𝑥
COMP90073 Security Analysis

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities
– –
• • •
Definition + examples Classification Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis

Evasion attacks (real-world example)
• RobustPhysical-WorldAttacksonDeepLearningVisualClassification[6] – Stop sign, Right Turn signSpeed Limit 45
– Drive-By(Field)Tests
– Startfrom250ftaway
– Classify every 10th frame
COMP90073 Security Analysis

Overview
• –
– –
Week 9: Adversarial Machine Learning – Vulnerabilities
– –
• • •
Definition + examples Classification Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks
Transferability
COMP90073 Security Analysis

Poisoning attacks
• Insert extra points to maximally decrease the accuracy [8] Class 1 (+)
𝑥𝑥𝑐𝑐
Class 2 (−)
COMP90073 Security Analysis

Poisoning attacks
𝐷𝐷=𝑥𝑥,𝑦𝑦
𝑚𝑚 𝑣𝑣𝑡𝑡𝑙𝑙 𝑖𝑖𝑖𝑖𝑖𝑖=1
• Attacker’s aim: maximise the hinge loss over the validation data
• Optimisation problem:
argmax𝐿𝐿(𝑥𝑥 )=� 1−𝑦𝑦𝑓𝑓 𝑥𝑥
𝑥𝑥𝑐𝑐
𝑚𝑚
𝑐𝑐 𝑖𝑖=1 𝑖𝑖𝑥𝑥𝑐𝑐𝑖𝑖
To find the optimal poisoning data 𝑥𝑥𝑐𝑐: • Random initial attack point 𝑥𝑥𝑐𝑐
• Update: re-compute the SVM;
𝑥𝑥𝑐𝑐
Class 1 (+)
𝑐𝑐𝑝𝑝 𝑐𝑐𝑝𝑝−1 𝜕𝜕𝐿𝐿 𝑥𝑥←𝑥𝑥+𝛼𝛼,𝑝𝑝>0
𝜕𝜕𝑥𝑥𝑐𝑐𝑝𝑝−1 • Until𝐿𝐿 𝑥𝑥𝑐𝑐𝑝𝑝 −𝐿𝐿 𝑥𝑥𝑐𝑐𝑝𝑝−1 <𝜀𝜀 Class 2 (−) COMP90073 Security Analysis Poisoning attacks As the attack point xc moves towards a local maximum, both the hinge loss and the classification error increase. COMP90073 Security Analysis Poisoning attacks • Poison frog attacks [10] – E.g.,addaseeminglyinnocuousimage(thatisproperlylabeled)toa training set, and control the identity of a chosen image at test time Target class Base class COMP90073 Security Analysis Poisoning attacks Target class Step 1: choose an instance from the target class – t (target instance) Step 2: sample an instance from the base class – b (base instance) Step 3: perturb b to create a poison instance – p Step 4: inject p into the training dataset The model is then re-trained. The attack succeeds if the poisoned model labels t as the base class t p b Base class COMP90073 Security Analysis Poisoning attacks • Generate poison data 𝑝𝑝 – Optimisationproblem:𝑝𝑝=argmin 𝑓𝑓 𝑥𝑥 −𝑓𝑓(𝑡𝑡) 2 +𝛽𝛽 𝑥𝑥−𝑏𝑏 2 𝑥𝑥 • 𝑓𝑓(𝑥𝑥): output of the second last layer of the neural network • 𝑓𝑓 𝑥𝑥 − 𝑓𝑓(𝑡𝑡) 2: makes p move toward the target instance •𝛽𝛽 𝑥𝑥−𝑏𝑏 :makespappearlikeabaseclassinstancetoa 2 in feature space and get embedded in the target class distribution 2 human labeller COMP90073 Security Analysis Poisoning attacks • Forward-backward-splitting iterative procedure [11] – Forward step: gradient descent update to minimise the L2 distance to the target instance in feature space – Backward step: proximal update that minimises the Euclidean distance from the base instance in input space COMP90073 Security Analysis Poisoning attacks • Results COMP90073 Security Analysis Poisoning attacks Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners [9] � � • Attacker’s objective : 𝑂𝑂𝐴𝐴 𝐷𝐷, 𝜃𝜃𝐷𝐷 = 𝜃𝜃𝐷𝐷 − 𝜃𝜃∗ + 𝐷𝐷 − 𝐷𝐷0 2 – 𝜃𝜃�𝐷𝐷: parameters of the poisoned model after the attack – 𝜃𝜃∗: parameters of the attacker’s target model, i.e., model that the attacker aims to obtain – 𝐷𝐷: poisoned training data – 𝐷𝐷0: original training data COMP90073 Security Analysis Overview • – – – Week 9: Adversarial Machine Learning – Vulnerabilities Definition + examples Classification Evasion attacks – – • • • Gradient-descent based approaches Automatic differentiation Real-world example Poisoning attacks Transferability COMP90073 Security Analysis Transferability & Black-box attacks • Implicit assumption: full knowledge of the target model • What if the target model is unknown to the attacker? • Transferability: for two models that perform the same task, trained on different datasets, adversarial samples generated against one model can often fool the other model as well [12][13] – Intra-technique: both the target and surrogate model use the same machine learning technique – Inter-technique: the target and surrogate model use different machine learning techniques Model 2 Model 1 COMP90073 Security Analysis Transferability & Black-box attacks • Verification on the MNIST dataset of handwritten digits – Grey-scale,0-255 – Size:28px*28px • DNN, SVM, LR, DT, kNN • Black-box attack – Step1:adversarytrainstheirownmodel–surrogate/source – Step2:generateadversarialsamplesagainstthesurrogate – Step3:applytheadversarialsamplesagainstthetargetmodel https://upload.wikimedia.org/wikipedia/ commons/2/27/MnistExamples.png COMP90073 Security Analysis Transferability & Black-box attacks • Intra-technique 71% adv. samples against the source are also effective against the target COMP90073 Security Analysis Transferability & Black-box attacks • Inter-technique COMP90073 Security Analysis Transferability & Black-box attacks • Non-smoothnesscanhurtthetransferability[7] – Aisthesurrogatemodel;Bisthetargetmodel – Smoothedlosssurfacecontributestotransferability – 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼 ∑ ,𝜉𝜉~𝒩𝒩(0,𝜎𝜎 ) 𝑖𝑖 𝑖𝑖−1 𝜕𝜕𝑥𝑥 𝑖𝑖 𝑖𝑖−1 𝑚𝑚𝑗𝑗=1 𝜕𝜕𝑥𝑥 𝑗𝑗 𝜕𝜕𝐽𝐽(𝑥𝑥𝑖𝑖−1) 1 𝑚𝑚 𝜕𝜕𝐽𝐽(𝑥𝑥𝑖𝑖−1+𝜉𝜉𝑗𝑗) 2 COMP90073 Security Analysis Transferability & Black-box attacks • Inputdiversityimprovestransferability[15] – Adversarialsamplesmayoverfittothesurrogatemodel – Dataaugmentation • Random resizing: resize an input image to a random size • Random padding: pad zeros around an image in a random manner – DiverseInputsIterativeFastGradientSignMethod(DI-FGSM) 𝑖𝑖 𝑖𝑖−1 𝜕𝜕𝐽𝐽(𝑥𝑥𝑖𝑖−1) 2 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼�sign 𝑖𝑖 𝑖𝑖−1 𝜕𝜕𝐽𝐽 𝑇𝑇(𝑥𝑥𝑖𝑖−1;𝑝𝑝) 𝑖𝑖−1 𝑥𝑥 𝑖𝑖−1 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼�sign 𝜕𝜕𝑥𝑥 , 𝑇𝑇(𝑥𝑥 ;𝑝𝑝)=�𝑇𝑇(𝑥𝑥 ) withprob.𝑝𝑝 𝑖𝑖 𝑖𝑖−1 ∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 𝑖𝑖 𝑖𝑖−1 ∇𝑥𝑥𝐽𝐽(𝑇𝑇(𝑥𝑥𝑖𝑖−1;𝑝𝑝)) 𝑠𝑠 =𝜇𝜇�𝑠𝑠 + 𝜕𝜕𝑥𝑥 𝑠𝑠 =𝜇𝜇�𝑠𝑠 +𝑖𝑖−1 with prob. 1 − 𝑝𝑝 2 – MomentumDiverseInputsIterativeFastGradientSignMethod(M-DI-FGSM) ∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 1 ∇𝑥𝑥𝐽𝐽(𝑇𝑇(𝑥𝑥𝑖𝑖−1;𝑝𝑝)) 1 COMP90073 Security Analysis Transferability & Black-box attacks • Backpropagationsmoothness[16],backpropagationlinearity[17] – Non-linearactivationfunctions,e.g.,ReLU,sigmoid – Non-continuousderivativeatzeroduringbackpropagation – Continuousderivativepropertycanimprovetransferability – KeeptheReLUfunctionintheforwardpass,butduringbackpropagation approximate the ReLU derivative with a continuous derivative, e.g. using softplus function (log(1 + 𝑒𝑒𝑥𝑥)) COMP90073 Security Analysis Summary • Evasionattacks – Indiscriminate:argmin 𝛿𝛿 −𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 – Targeted:arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑 𝛿𝛿∈ 0,1 𝑑𝑑 ̂ ̂ • 𝜃𝜃 : poisoned model after the attack ̂ 𝐴𝐴𝐷𝐷𝐷𝐷02 • Poisoningattacks – Attacker’sobjective:𝑂𝑂 𝐷𝐷,𝜃𝜃 = 𝜃𝜃 −𝜃𝜃∗ + 𝐷𝐷−𝐷𝐷 • 𝜃𝜃 : attacker’s target, i.e., model that the attacker aims to obtain • 𝐷𝐷: poisoned training data • 𝐷𝐷0: original training data 𝐷𝐷 ∗ • Transferability – Intra,inter-technique – Black-boxattacks COMP90073 Security Analysis References • [1] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar, “The Security of Machine Learning,” Machine Learning, vol. 81, no. 2, pp. 121–148, Nov. 2010. • [2] N. Carlini and D. Wagner, “Towards Evaluating the Robustness of Neural Networks,” eprint arXiv:1608.04644, 2016. • [3] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” eprint arXiv:1412.6572, 2014. • [4] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533, 2016. • [5] A. G. Baydin and B. A. Pearlmutter, “Automatic Differentiation of Algorithms for Machine Learning,” arXiv:1404.7456 [cs, stat], Apr. 2014. • [6] I. Evtimov et al., “Robust Physical-World Attacks on Machine Learning Models,” arXiv preprint arXiv:1707.08945, 2017. • [7] Wu, L. and Zhu, Z., “Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks.” Proceedings of The 12th Asian Conference on Machine Learning:837-850. Available from https://proceedings.mlr.press/v129/wu20a.html. COMP90073 Security Analysis References • [8] B. Biggio, B. Nelson, and P. Laskov, “Poisoning Attacks against Support Vector Machines,” in Proceedings of the 29th International Coference on International Conference on Machine Learning, Edinburgh, Scotland, 2012, pp. 1467–1474. • [9] S. Mei and X. Zhu, “Using Machine Teaching to Identify Optimal Training-set Attacks on Machine Learners,” in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, 2015, pp. 2871–2877. • [10] A. Shafahi et al., “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks,” arXiv:1804.00792 [cs, stat], Apr. 2018. • [11] T. Goldstein, C. Studer, and R. Baraniuk. “A field guide to forward-backward splitting with a fasta implementation,” arXiv preprint arXiv:1411.3406, 2014 • [12] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples,” eprint arXiv:1605.07277, 2016. • [13] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples,” eprint arXiv:1602.02697, 2016. • [14] Yinpeng Dong, , Tianyu Pang, Hang Su, , , and Jianguo Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018. COMP90073 Security Analysis References • [15] Xie, Cihang and Zhang, Zhishuai and Zhou, Yuyin and Bai, Song and Wang, Jianyu and Ren, Zhou and Yuille, Alan, “Improving Transferability of Adversarial Examples with Input Diversity,” in CVPR, 2019. • [16] *, *, *, , Soomin Ham, Chan- , In So Kweon, “Backpropagating Smoothly Improves Transferability of Adversarial Examples,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop on Adversarial Machine Learning, 2021 (*: Equal Contribution) • [17] Y. Guo, Q. Li, and H. Chen. “Backpropagating linearly improves transferability of adversarial examples,” arXiv preprint arXiv:2012.03528, 2020 COMP90073 Security Analysis