Week 9: Adversarial Machine Learning – Vulnerabilities (Part I)
COMP90073 Security Analytics
, CIS Semester 2, 2021
Copyright By PowCoder代写 加微信 powcoder
Week 9: Adversarial Machine Learning – Vulnerabilities Definition + examples
Classification
Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities Definition + examples
Classification
Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis
Definition
What is Adversarial Machine Learning (AML)?
“Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input.” – Wikipedia
COMP90073 Security Analysis
• Test-time attack
Image classifier C: input images X{0, 1, 2, …, 9}, 0 +𝛿𝛿1
100×100 Input vector
P0 0.01 0.01
P1 0.02 0.02
P2 0.01 0.01
. 0.02 0.02
. 0.01 0.01
. 0.01 0.01
0.03 0.56 P9 0.85 0.32
Output vector
Height 100px
Width 100px
…𝑖𝑖 255 +𝛿𝛿𝑗𝑗
Classifier
Adversarial sample
…0 + 𝛿𝛿 0
COMP90073 Security Analysis
• Test-time attack Original
images + Perturbation
Adversarial samples
Classifier output
6248960254
COMP90073 Security Analysis
• Training-time attack
– Insert extra training points to maximise the loss
Class 1 (+)
Class 2 (−)
COMP90073 Security Analysis
• Hugeamountofattention – Mission-criticaltasks
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities
Definition + examples
Classification
Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis
Classification
• Classification [1]
– Exploratory/evasion: test time
– Causative/poisoning: training time
– Integrityvs.Availability–securityviolation
– Integrity: harmful instances to pass filters
– Availability: denial of service, benign instances to be filtered
– Targetedvs.Indiscriminate/Untargeted–specificity
– Targeted: misclassified as a specific class
– Indiscriminate/untargeted: misclassified as any other class
– White-box: full knowledge of the victim model
– Black-box: no/minimum knowledge of the model
Exploratory vs. Causative – influence
White-box vs. Black-box – attacker information
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities
Definition + examples Classification Evasion attacks
Gradient-descent based approaches
Automatic differentiation
Real-world example Poisoning attacks Transferability
COMP90073 Security Analysis
Evasion attacks (definition)
• Evasion attack
– Aim: minimum perturbation δ to the input 𝑥𝑥, in order to cause
model C to misclassify
𝑥𝑥→𝑥𝑥+𝛿𝛿(𝑥𝑥,𝛿𝛿∈ 0,1 )
such that (s.t.)
𝐶𝐶 𝑥𝑥 + 𝛿𝛿 ≠ 𝐶𝐶 𝑥𝑥 Indiscriminate
OR 𝐶𝐶 𝑥𝑥 + 𝛿𝛿 = 𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 Targeted
COMP90073 Security Analysis
Evasion attacks (definition)
• Evasion attack
– Formulated as an optimisation problem
arg min 𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
s.t.𝐶𝐶𝑥𝑥+𝛿𝛿 ≠𝐶𝐶𝑥𝑥 OR 𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
– p-norm: 𝛿𝛿 = ∑𝑑𝑑 |𝛿𝛿|𝑝𝑝 1/𝑝𝑝 𝑝𝑝 𝑖𝑖=1𝑖𝑖
Highly non-linear
– 𝛿𝛿 =∑𝑑𝑑 |𝛿𝛿|, 𝛿𝛿 = ∑𝑑𝑑 𝛿𝛿 2, 𝛿𝛿 =max𝛿𝛿 1 𝑖𝑖=1 𝑖𝑖 2 𝑖𝑖=1 𝑖𝑖 ∞ 𝑖𝑖 𝑖𝑖
•E.g.,𝛿𝛿=1,2,3,−4,𝛿𝛿1=10,𝛿𝛿2= 30,𝛿𝛿∞=4
COMP90073 Security Analysis
Evasion attacks (definition)
Transform (1) to the following problem [2]: argmin 𝛿𝛿−𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
Indiscriminate Targeted
𝛿𝛿∈ 0,1 𝑑𝑑
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
Objective function f: how close the prediction and the target are, e.g., the cross entropy loss function
COMP90073 Security Analysis
Evasion attacks (definition)
• Indiscriminate attack: arg min 𝛿𝛿 − 𝑐𝑐 � 𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥 + 𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
– Prediction as different from the truth as possible Misclassified as any class other than “9”
0 0 0 0 0 0 0 0 0 1
Ground truth (One-hot vector)
Classifier
P0 0.01 P1 0.02 P2 0.01
0.01 . 0.02 . 0.01 . 0.01 0.03 0.56 P9 0.32
Prediction
Classifier
COMP90073 Security Analysis
Evasion attacks (definition)
• Targeted attack: arg min 𝛿𝛿 + 𝑐𝑐 � 𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥 + 𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
– Prediction as close to the target as possible
Misclassified as “8” (target class)
P0 0.01 0 P1 0.02 0 P2 0.01 0
0.01 0 .0.02 0 .0.01 0 .0.01 0
0.56 1 P9 0.32 0
Classifier
Prediction
Target (One-hot vector)
COMP90073 Security Analysis
Evasion attacks (definition)
Transform (1) to the following problem [2]: argmin 𝛿𝛿−𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 Targeted
How to find the minimum perturbation 𝛿𝛿?
COMP90073 Security Analysis
Evasion attacks (gradient descent)
Gradient descent
• Gradient: a vector that points in the direction of greatest increase of a function
https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html
COMP90073 Security Analysis
Evasion attacks (gradient descent)
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate Targeted
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
• Start with the initial input 𝑥𝑥0 J •Repeat𝑥𝑥𝑖𝑖←𝑥𝑥𝑖𝑖−1−𝛼𝛼 𝜕𝜕𝐽𝐽 ,𝑖𝑖>0
• Until(1)𝐶𝐶 𝑥𝑥𝑖𝑖 ≠𝐶𝐶 𝑥𝑥0 (𝑜𝑜𝑜𝑜𝐶𝐶 𝑥𝑥𝑖𝑖 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡),or
(2)𝛿𝛿=𝑥𝑥𝑖𝑖−𝑥𝑥0 >𝜖𝜖,or (3) 𝑖𝑖 ≥ 𝑖𝑖𝑚𝑚𝑡𝑡𝑥𝑥,or
(4)𝐽𝐽𝑥𝑥𝑖𝑖 −𝐽𝐽(𝑥𝑥𝑖𝑖−1)≤∆
success failure
COMP90073 Security Analysis
Evasion attacks (gradient descent)
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate Targeted
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
•Repeat𝑥𝑥𝑖𝑖←𝑥𝑥𝑖𝑖−1−𝛼𝛼 𝜕𝜕𝐽𝐽 ,𝑖𝑖>0
• … 𝜕𝜕𝑥𝑥𝑖𝑖−1
How to design the objective function f ?
COMP90073 Security Analysis
Evasion attacks (FGSM)
• Fast gradient sign method (FGSM) [3]:
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿 argmin−𝑙𝑙𝑜𝑜𝑙𝑙𝑙𝑙 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝛿𝛿∈ 0,1 𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝛿𝛿∈ 0,1 𝑑𝑑 entropy loss 𝛿𝛿∈ 0,1 𝑑𝑑
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 f=cross argmin𝑙𝑙𝑜𝑜𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 • Single step 𝜖𝜖: fast rather than optimal
𝑥𝑥′ ←𝑥𝑥+𝜖𝜖�𝑙𝑙𝑖𝑖𝑠𝑠𝑠𝑠
𝜕𝜕𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝑥𝑥
OR 𝑥𝑥′ ← 𝑥𝑥 − 𝜖𝜖 � 𝑙𝑙𝑖𝑖𝑠𝑠𝑠𝑠
• Not meant to produce the minimal adversarial perturbations
𝜕𝜕𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝑥𝑥
COMP90073 Security Analysis
Evasion attacks (Iterative Gradient Sign)
• Iterative gradient sign [4]
– Single step 𝜖𝜖multiple smaller steps 𝛼𝛼
– 𝑥𝑥𝑖𝑖←clip𝜖𝜖 𝑥𝑥𝑖𝑖−1+𝛼𝛼�sign𝜕𝜕𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 OR 𝑖𝑖 𝜖𝜖 𝑖𝑖−1 𝜕𝜕𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
– 𝑥𝑥 ←clip 𝑥𝑥 −𝛼𝛼�sign 𝜕𝜕𝑥𝑥𝑖𝑖−1 𝜕𝜕𝑥𝑥𝑖𝑖−1
– clip𝜖𝜖: make sure that 𝑥𝑥𝑖𝑖𝑗𝑗 is within the range of 𝑥𝑥0𝑗𝑗 − 𝜖𝜖, 𝑥𝑥0𝑗𝑗 + 𝜖𝜖 projection
COMP90073 Security Analysis
Evasion attacks (Momentum Iterative FGSM)
– 𝑠𝑠 =𝜇𝜇�𝑠𝑠 +
𝑖𝑖 𝑖𝑖−1 ∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 𝑖𝑖 𝑖𝑖−1 𝑖𝑖
• Momentum iterative fast gradient sign method
∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 1
, 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼�sign(𝑠𝑠)
– Momentum overcome two problems of vanilla gradient descent • Get stuck in local minima
• Oscillation
https://medium.com/analytics-vidhya/momentum-rmsprop- https://eloquentarduino.github.io/2020/04/stochastic- and-adam-optimizer-5769721b4b19 gradient-descent-on-your-microcontroller/
COMP90073 Security Analysis
Evasion attacks (C&W)
C & W attack [2]
argmin 𝛿𝛿+𝑐𝑐�𝑓𝑓𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ifandonlyif 𝑓𝑓 𝑥𝑥+𝛿𝛿 ≤0
𝐶𝐶 𝑥𝑥 + 𝛿𝛿 ≠ 𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ⇔ 𝑓𝑓 𝑥𝑥 + 𝛿𝛿 > 0
Consistent with the definition of function f: how close the prediction and the target are
COMP90073 Security Analysis
Evasion attacks (C&W)
𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ifandonlyif 𝑓𝑓 𝑥𝑥+𝛿𝛿 =𝑓𝑓(𝑥𝑥′)≤0 •Option1:𝑓𝑓𝑥𝑥′ =𝑚𝑚𝑚𝑚𝑥𝑥max𝐹𝐹(𝑥𝑥′)𝑖𝑖−𝐹𝐹𝑥𝑥′𝑡𝑡,0
• Option 2: 𝑓𝑓 𝑥𝑥′ = log(1 + exp(max 𝐹𝐹(𝑥𝑥′)𝑖𝑖 − 𝐹𝐹 𝑥𝑥′ 𝑡𝑡)) − log(2)
• Option3:𝑓𝑓 𝑥𝑥′ =𝑚𝑚𝑚𝑚𝑥𝑥 0.5−𝐹𝐹 𝑥𝑥′ 𝑡𝑡, 0
F(𝑥𝑥): output vector for 𝑥𝑥, i.e., probabilities of the input 𝑥𝑥 belonging to each class. For
F(x)0 0.01
F(x)1 0.01 F(𝑥𝑥)
Classifier
2 0.01 . 0.02 . 0.01 . 0.01
0.03 F(x)9 0.85
COMP90073 Security Analysis
CleverHans
• CleverHans
• Donotusethelatestversion
• Downloadfrom:https://github.com/tensorflow/cleverhans/releases/tag/v.3.0.1 • Prerequisite:
– Python3(https://www.python.org/downloads/)
– Tensorflow(https://www.tensorflow.org/install/)
– Python3.5/3.6/3.7andTensorFlow{1.8,1.12,1.14}
• Installation:
– cdcleverhans
– pipinstall-e.
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities
Definition + examples Classification Evasion attacks
Gradient-descent based approaches
Automatic differentiation
Real-world example Poisoning attacks Transferability
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
arg min 𝛿𝛿 −𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿 𝛿𝛿∈ 0,1 𝑑𝑑
Indiscriminate Targeted
arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
• Start with the initial input 𝑥𝑥0
•Repeat𝑥𝑥𝑖𝑖←𝑥𝑥𝑖𝑖−1−𝛼𝛼 𝜕𝜕𝐽𝐽 ,𝑖𝑖>0 𝜕𝜕𝑥𝑥𝑖𝑖−1
• Until𝐶𝐶 𝑥𝑥𝑖𝑖 ≠𝐶𝐶 𝑥𝑥0 (𝑜𝑜𝑜𝑜𝐶𝐶 𝑥𝑥𝑖𝑖 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡) How to calculate the partial derivatives?
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
Derivative ′
• Definition: 𝑓𝑓 𝑥𝑥 = lim
𝑓𝑓 𝑥𝑥+h −𝑓𝑓(𝑥𝑥) h→0 h
𝑓𝑓 𝑥𝑥+h −𝑓𝑓(𝑥𝑥) 𝑓𝑓 𝑥𝑥+h −𝑓𝑓(𝑥𝑥−h) –,
• Numerical differentiation
– Significant round-off errors
• Symbolic differentiation: apply chain rules to symbolic expressions – Exponentially-long results
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
Automatic differentiation
• A set of techniques to numerically evaluate the derivative of a function specified by a computer program – Wikipedia
• Any complicated function 𝑓𝑓 can be rewritten as the composition of a sequence of primitive functions:
𝑓𝑓=𝑓𝑓 ∘𝑓𝑓 ∘𝑓𝑓 ∘⋯∘𝑓𝑓
• Apply the chain rule
• Forward mode: 𝜕𝜕𝑓𝑓 = 𝜕𝜕𝑓𝑓0 𝜕𝜕𝑓𝑓1 … 𝜕𝜕𝑓𝑓𝑛𝑛−1 𝜕𝜕𝑓𝑓𝑛𝑛
𝜕𝜕𝑥𝑥 𝜕𝜕𝑓𝑓1 𝜕𝜕𝑓𝑓2 𝜕𝜕𝑓𝑓𝑛𝑛 𝜕𝜕𝑥𝑥
• Reverse mode: 𝜕𝜕𝑓𝑓 = 𝜕𝜕𝑓𝑓0 𝜕𝜕𝑓𝑓1 … 𝜕𝜕𝑓𝑓𝑛𝑛−1 𝜕𝜕𝑓𝑓𝑛𝑛 𝜕𝜕𝑥𝑥 𝜕𝜕𝑓𝑓1 𝜕𝜕𝑓𝑓2 𝜕𝜕𝑓𝑓𝑛𝑛 𝜕𝜕𝑥𝑥
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
• Given𝑦𝑦=𝑓𝑓 𝑥𝑥1,𝑥𝑥2 =ln 𝑥𝑥1 +𝑥𝑥1𝑥𝑥2 −sin 𝑥𝑥2 ,calculate 𝜕𝜕𝜕𝜕 at(2,5) • Forward mode [5] 𝜕𝜕𝑥𝑥1
− v5 = v4 − v3
𝑦𝑦 = 𝑣𝑣 , 𝑖𝑖 = 𝑚𝑚 − 1, … , 0 𝑚𝑚−𝑖𝑖 𝑙𝑙−𝑖𝑖
Output variables:
𝑣𝑣𝑖𝑖,𝑖𝑖 = 1,…,𝑙𝑙
v4 =v1 +v2
v2 = v-1 × v0
Working variables:
v = ln(v ) ln × 1 -1
v-1 = x1 x1
sin v = sin(v ) 3 0
𝑣𝑣 = 𝑥𝑥 , 𝑖𝑖 = 1, … , 𝑠𝑠 𝑖𝑖−𝑛𝑛 𝑖𝑖
Input variables:
Computational graph
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
v1 = ln(v-1)
sin v3 = sin(v0) x2 v0 = x2
− v5 = v4 − v3
v4 =v1 +v2
v2 = v-1 × v0
𝑣𝑣𝑖𝑖̇ = 𝜕𝜕𝑣𝑣1 𝜕𝜕𝑥𝑥𝑖𝑖
cos5×1 0 +2
2-cos5 2-cos5
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
• Reverse mode [5] Computational graph
− v5 = v4 − v3
v4 =v1 +v2
v2 = v-1 × v0
v1 = ln(v-1) ln × v-1=x1 x1
sin v3 = sin(v0)
v0=x2 𝑣𝑣�𝑖𝑖=𝜕𝜕𝜕𝜕𝑗𝑗𝑖𝑖 𝜕𝜕𝑣𝑣
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
• Example1:𝑦𝑦=ln 𝑥𝑥1 +𝑥𝑥1𝑥𝑥2 −sin 𝑥𝑥2 • Calculate 𝜕𝜕𝜕𝜕 , 𝜕𝜕𝜕𝜕
𝜕𝜕𝑥𝑥1 𝜕𝜕𝑥𝑥2
– Forward mode: __ time(s) – Reverse mode: __ time(s)
• Example2:𝑦𝑦1 =ln 𝑥𝑥 +𝑥𝑥,𝑦𝑦2=𝑥𝑥−sin 𝑥𝑥 • Calculate 𝜕𝜕𝜕𝜕1 , 𝜕𝜕𝜕𝜕2
– Forward mode: __ time(s) – Reverse mode: __ time(s)
COMP90073 Security Analysis
Evasion attacks (automatic differentiation)
Function 𝑓𝑓: 𝑅𝑅𝑛𝑛 → 𝑅𝑅𝑚𝑚
• n independent x as inputs, m dependent y as outputs
• Reverse mode: 𝑠𝑠 ≫ 𝑚𝑚 (one reverse run can calculate 𝜕𝜕𝜕𝜕𝑗𝑗)
• Forward mode: 𝑚𝑚 ≫ 𝑠𝑠 (one forward run can calculate 𝜕𝜕𝜕𝜕 ) i j𝜕𝜕𝑥𝑥𝑖𝑖
Tensorflow example
import tensorflow as tf
x = tf.Variable(1.)
y = tf.Variable(2.)
z = tf.subtract(2*x, y)
sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(grad)) # [2.0, -1.0]
#http://laid.delanover.com/gradients-in-tensorflow/
grad = tf.gradients(z, [x, y])
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities
Definition + examples Classification Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks Transferability
COMP90073 Security Analysis
Evasion attacks (real-world example)
• RobustPhysical-WorldAttacksonDeepLearningVisualClassification[6] – Stop sign, Right Turn signSpeed Limit 45
– Drive-By(Field)Tests
– Startfrom250ftaway
– Classify every 10th frame
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities
Definition + examples Classification Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks
Transferability
COMP90073 Security Analysis
Poisoning attacks
• Insert extra points to maximally decrease the accuracy [8] Class 1 (+)
Class 2 (−)
COMP90073 Security Analysis
Poisoning attacks
𝑚𝑚 𝑣𝑣𝑡𝑡𝑙𝑙 𝑖𝑖𝑖𝑖𝑖𝑖=1
• Attacker’s aim: maximise the hinge loss over the validation data
• Optimisation problem:
argmax𝐿𝐿(𝑥𝑥 )=� 1−𝑦𝑦𝑓𝑓 𝑥𝑥
𝑐𝑐 𝑖𝑖=1 𝑖𝑖𝑥𝑥𝑐𝑐𝑖𝑖
To find the optimal poisoning data 𝑥𝑥𝑐𝑐: • Random initial attack point 𝑥𝑥𝑐𝑐
• Update: re-compute the SVM;
Class 1 (+)
𝑐𝑐𝑝𝑝 𝑐𝑐𝑝𝑝−1 𝜕𝜕𝐿𝐿 𝑥𝑥←𝑥𝑥+𝛼𝛼,𝑝𝑝>0
𝜕𝜕𝑥𝑥𝑐𝑐𝑝𝑝−1 • Until𝐿𝐿 𝑥𝑥𝑐𝑐𝑝𝑝 −𝐿𝐿 𝑥𝑥𝑐𝑐𝑝𝑝−1 <𝜀𝜀
Class 2 (−)
COMP90073 Security Analysis
Poisoning attacks
As the attack point xc moves towards a local maximum, both the hinge loss and the classification error increase.
COMP90073 Security Analysis
Poisoning attacks
• Poison frog attacks [10]
– E.g.,addaseeminglyinnocuousimage(thatisproperlylabeled)toa
training set, and control the identity of a chosen image at test time Target class
Base class
COMP90073 Security Analysis
Poisoning attacks
Target class
Step 1: choose an instance from the target class – t (target instance)
Step 2: sample an instance from the base class – b (base instance)
Step 3: perturb b to create a poison instance – p
Step 4: inject p into the training dataset
The model is then re-trained. The attack succeeds if the poisoned model labels t as the base class
Base class
COMP90073 Security Analysis
Poisoning attacks
• Generate poison data 𝑝𝑝
– Optimisationproblem:𝑝𝑝=argmin 𝑓𝑓 𝑥𝑥 −𝑓𝑓(𝑡𝑡) 2 +𝛽𝛽 𝑥𝑥−𝑏𝑏 2
• 𝑓𝑓(𝑥𝑥): output of the second last layer of the neural network
• 𝑓𝑓 𝑥𝑥 − 𝑓𝑓(𝑡𝑡) 2: makes p move toward the target instance
•𝛽𝛽 𝑥𝑥−𝑏𝑏 :makespappearlikeabaseclassinstancetoa 2
in feature space and get embedded in the target class
distribution
human labeller
COMP90073 Security Analysis
Poisoning attacks
• Forward-backward-splitting iterative procedure [11]
– Forward step: gradient descent update to minimise the L2
distance to the target instance in feature space
– Backward step: proximal update that minimises the Euclidean distance from the base instance in input space
COMP90073 Security Analysis
Poisoning attacks
COMP90073 Security Analysis
Poisoning attacks
Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners [9] � �
• Attacker’s objective : 𝑂𝑂𝐴𝐴 𝐷𝐷, 𝜃𝜃𝐷𝐷 = 𝜃𝜃𝐷𝐷 − 𝜃𝜃∗ + 𝐷𝐷 − 𝐷𝐷0 2 – 𝜃𝜃�𝐷𝐷: parameters of the poisoned model after the attack
– 𝜃𝜃∗: parameters of the attacker’s target model, i.e., model that the attacker aims to obtain
– 𝐷𝐷: poisoned training data – 𝐷𝐷0: original training data
COMP90073 Security Analysis
Week 9: Adversarial Machine Learning – Vulnerabilities Definition + examples
Classification
Evasion attacks
Gradient-descent based approaches Automatic differentiation
Real-world example
Poisoning attacks
Transferability
COMP90073 Security Analysis
Transferability & Black-box attacks
• Implicit assumption: full knowledge of the target model
• What if the target model is unknown to the attacker?
• Transferability: for two models that perform the same task, trained on different datasets, adversarial samples generated against one model can often fool the other model as well [12][13]
– Intra-technique: both the target and surrogate model use the same machine learning technique
– Inter-technique: the target and surrogate model use different
machine learning techniques
COMP90073 Security Analysis
Transferability & Black-box attacks
• Verification on the MNIST dataset of handwritten digits – Grey-scale,0-255
– Size:28px*28px
• DNN, SVM, LR, DT, kNN
• Black-box attack
– Step1:adversarytrainstheirownmodel–surrogate/source
– Step2:generateadversarialsamplesagainstthesurrogate
– Step3:applytheadversarialsamplesagainstthetargetmodel
https://upload.wikimedia.org/wikipedia/ commons/2/27/MnistExamples.png
COMP90073 Security Analysis
Transferability & Black-box attacks
• Intra-technique
71% adv. samples against the source are also effective against the target
COMP90073 Security Analysis
Transferability & Black-box attacks
• Inter-technique
COMP90073 Security Analysis
Transferability & Black-box attacks
• Non-smoothnesscanhurtthetransferability[7]
– Aisthesurrogatemodel;Bisthetargetmodel
– Smoothedlosssurfacecontributestotransferability
– 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼 ∑ ,𝜉𝜉~𝒩𝒩(0,𝜎𝜎 ) 𝑖𝑖 𝑖𝑖−1 𝜕𝜕𝑥𝑥 𝑖𝑖 𝑖𝑖−1 𝑚𝑚𝑗𝑗=1 𝜕𝜕𝑥𝑥 𝑗𝑗
𝜕𝜕𝐽𝐽(𝑥𝑥𝑖𝑖−1) 1 𝑚𝑚 𝜕𝜕𝐽𝐽(𝑥𝑥𝑖𝑖−1+𝜉𝜉𝑗𝑗)
COMP90073 Security Analysis
Transferability & Black-box attacks
• Inputdiversityimprovestransferability[15]
– Adversarialsamplesmayoverfittothesurrogatemodel – Dataaugmentation
• Random resizing: resize an input image to a random size
• Random padding: pad zeros around an image in a random manner – DiverseInputsIterativeFastGradientSignMethod(DI-FGSM)
𝑖𝑖 𝑖𝑖−1 𝜕𝜕𝐽𝐽(𝑥𝑥𝑖𝑖−1) 2 𝑥𝑥 ←𝑥𝑥 −𝛼𝛼�sign
𝑖𝑖 𝑖𝑖−1 𝜕𝜕𝐽𝐽 𝑇𝑇(𝑥𝑥𝑖𝑖−1;𝑝𝑝) 𝑖𝑖−1 𝑥𝑥 𝑖𝑖−1
𝑥𝑥 ←𝑥𝑥 −𝛼𝛼�sign 𝜕𝜕𝑥𝑥 , 𝑇𝑇(𝑥𝑥 ;𝑝𝑝)=�𝑇𝑇(𝑥𝑥 ) withprob.𝑝𝑝
𝑖𝑖 𝑖𝑖−1 ∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 𝑖𝑖 𝑖𝑖−1 ∇𝑥𝑥𝐽𝐽(𝑇𝑇(𝑥𝑥𝑖𝑖−1;𝑝𝑝)) 𝑠𝑠 =𝜇𝜇�𝑠𝑠 + 𝜕𝜕𝑥𝑥 𝑠𝑠 =𝜇𝜇�𝑠𝑠 +𝑖𝑖−1
with prob. 1 − 𝑝𝑝
– MomentumDiverseInputsIterativeFastGradientSignMethod(M-DI-FGSM)
∇𝑥𝑥𝐽𝐽(𝑥𝑥𝑖𝑖−1) 1 ∇𝑥𝑥𝐽𝐽(𝑇𝑇(𝑥𝑥𝑖𝑖−1;𝑝𝑝)) 1
COMP90073 Security Analysis
Transferability & Black-box attacks
• Backpropagationsmoothness[16],backpropagationlinearity[17]
– Non-linearactivationfunctions,e.g.,ReLU,sigmoid
– Non-continuousderivativeatzeroduringbackpropagation
– Continuousderivativepropertycanimprovetransferability
– KeeptheReLUfunctionintheforwardpass,butduringbackpropagation approximate the ReLU derivative with a continuous derivative, e.g. using softplus function (log(1 + 𝑒𝑒𝑥𝑥))
COMP90073 Security Analysis
• Evasionattacks
– Indiscriminate:argmin 𝛿𝛿 −𝑐𝑐�𝑓𝑓 𝑥𝑥+𝛿𝛿
– Targeted:arg min 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥+𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
𝛿𝛿∈ 0,1 𝑑𝑑 ̂
• 𝜃𝜃 : poisoned model after the attack
̂ 𝐴𝐴𝐷𝐷𝐷𝐷02
• Poisoningattacks
– Attacker’sobjective:𝑂𝑂 𝐷𝐷,𝜃𝜃 = 𝜃𝜃 −𝜃𝜃∗ + 𝐷𝐷−𝐷𝐷
• 𝜃𝜃 : attacker’s target, i.e., model that the attacker aims to obtain • 𝐷𝐷: poisoned training data
• 𝐷𝐷0: original training data
• Transferability
– Intra,inter-technique – Black-boxattacks
COMP90073 Security Analysis
References
• [1] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar, “The Security of Machine Learning,” Machine Learning, vol. 81, no. 2, pp. 121–148, Nov. 2010.
• [2] N. Carlini and D. Wagner, “Towards Evaluating the Robustness of Neural Networks,” eprint arXiv:1608.04644, 2016.
• [3] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” eprint arXiv:1412.6572, 2014.
• [4] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533, 2016.
• [5] A. G. Baydin and B. A. Pearlmutter, “Automatic Differentiation of Algorithms for Machine Learning,” arXiv:1404.7456 [cs, stat], Apr. 2014.
• [6] I. Evtimov et al., “Robust Physical-World Attacks on Machine Learning Models,” arXiv preprint arXiv:1707.08945, 2017.
• [7] Wu, L. and Zhu, Z., “Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks.” Proceedings of The 12th Asian Conference on Machine Learning:837-850. Available from https://proceedings.mlr.press/v129/wu20a.html.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com