Week 10: Adversarial Machine Learning – Vulnerabilities (Part II) Explanation, Detection & Defence
COMP90073 Security Analytics
, CIS Semester 2, 2021
Copyright By PowCoder代写 加微信 powcoder
• Adversarial machine learning beyond computer vision – Audio
– Natural language processing (NLP)
– Malware detection
• Why are machine learning models vulnerable?
– Insufficient training data
– Unnecessary features
• How to defend against adversarial machine learning?
– Data-driven defences
– Learner robustification • Challenges
COMP90073 Security Analysis
Audio Adversarial Examples
• Speechrecognitionsystem
– RecurrentNeuralNetworks
– Audio waveforma sequence of probability distributions over individual characters
https://distill.pub/2017/ctc/
– Challenge:alignmentbetweentheinputandtheoutput • Exact location of each character in the audio file
HEEELLLLOO
COMP90073 Security Analysis
Audio Adversarial Examples
• ConnectionistTemporalClassification(CTC) – Encoding
• 𝑌𝑌 = 𝐵𝐵(𝑌𝑌): modify the ground truth text (𝑌𝑌) by (1) inserting “–”, (2) repeating characters in all possible ways
• Introduce a special character called blank, denoted as “–” ′
• A blank character must be inserted between duplicate characters • E.g.,
– InputXhasalengthof10,and𝑌𝑌=[h,e,l,l,o]. – Valid: heeell–llo, hhhh–el–lo, heell–looo
– Invalid: hhee–llo–o, heel–lo
COMP90073 Security Analysis
Audio Adversarial Examples
• ConnectionistTemporalClassification(CTC)
• Calculate the score for each 𝑌𝑌′ and sum them up – Loss function 𝑝𝑝 𝑌𝑌 𝑋𝑋 = � �|𝑋𝑋| 𝑝𝑝𝑖𝑖(𝑦𝑦𝑖𝑖′|𝑋𝑋)
𝑌𝑌′ 𝑖𝑖=1 per time-step probabilities • Loss = negative log likelihood of the sum
− �𝑌𝑌′ �𝑖𝑖=1𝑙𝑙𝑙𝑙𝑙𝑙 𝑝𝑝𝑖𝑖(𝑦𝑦𝑖𝑖′|𝑋𝑋)
– Decoding
• Pick character with highest score for each time step • Remove duplicate characters, remove blanks
• E.g.,HEE–LL–LOOHE–L–LOHELLO
COMP90073 Security Analysis
Audio Adversarial Examples
• Computervisiondomain:
– arg min 𝛿𝛿 + 𝑐𝑐 � 𝑓𝑓 𝑥𝑥 + 𝛿𝛿
𝛿𝛿∈ 0,1 𝑑𝑑
– 𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ⇔ 𝑓𝑓 𝑥𝑥+𝛿𝛿 =𝑓𝑓(𝑥𝑥′)≤0
• Audioadversarialexamplesagainstspeechrecognitionsystem[14] – Howtomeasuretheperturbation𝛿𝛿?
• Measure 𝛿𝛿 in Decibels (dB): 𝑑𝑑𝐵𝐵 𝑥𝑥 = max20𝑙𝑙𝑙𝑙𝑙𝑙10(𝑥𝑥𝑖𝑖)
• 𝑑𝑑𝐵𝐵𝑥𝑥 𝛿𝛿 =𝑑𝑑𝐵𝐵𝛿𝛿 −𝑑𝑑𝐵𝐵(𝑥𝑥) 𝑖𝑖 – Howtoconstructtheobjectivefunction?
• Choose CTC-Loss(𝑥𝑥′; 𝑦𝑦 ) as function 𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• 𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ⇐𝑓𝑓 𝑥𝑥+𝛿𝛿 =𝑓𝑓(𝑥𝑥′)≤0
• 𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ⇏ 𝑓𝑓 𝑥𝑥+𝛿𝛿 =𝑓𝑓(𝑥𝑥′)≤0
• Solution will still be adversarial, but may not be minimally perturbed
– Examples:https://nicholas.carlini.com/code/audio_adversarial_examples
COMP90073 Security Analysis
• Deeptextclassification – Characterlevel[15]
• Every character is represented using
• 6 convolutional layers + 3 fully-connected layers
one-hot encoding
COMP90073 Security Analysis
• DeepTextClassificationCanbeFooled[16]
– Identifytextitemsthatcontributemosttotheclassification
𝜕𝜕𝑓𝑓𝑡𝑡𝑟𝑟𝑟𝑟𝑟𝑟
– Contribution measured based on the gradient 𝜕𝜕𝑥𝑥 , x: training sample
– Hotcharacter:containingthedimensionswithhighestgradientmagnitude
– Hotword:containing≥3hotcharacters
– Hotphrase:singlehotword,oradjacenthotwords
– HotTraining/SamplePhrase:hotphrasethatoccursmostfrequentlyinthe training data/test sample
COMP90073 Security Analysis
• DeepTextClassificationCanbeFooled[16] – Giventextx,𝐶𝐶 𝑥𝑥+𝛿𝛿 =𝑡𝑡
– Insertion
• What to insert: Hot Training Phrases of the target class
• Where to insert: near Hot Sample Phrases of the original class
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
COMP90073 Security Analysis
• DeepTextClassificationCanbeFooled[16]
– Modification:replacethecharactersinHotSamplePhrasesby
• Common misspellings or • Characters visually similar
COMP90073 Security Analysis
• DeepTextClassificationCanbeFooled[16]
– Removal:theinessentialadjectiveoradverbinHSPsareremoved
• Less effective
• Only downgrade the confidence of the original class
COMP90073 Security Analysis
• DeepTextClassificationCanbeFooled[16] – Combinationofthreestrategies
– Limit:allperformedmanually
COMP90073 Security Analysis
Malware Detection
• Attacking malware classifier for mobile phones [7]
– An application is represented by a binary vector Χ ∈ 0, 1 𝑑𝑑
• 1: the app has the feature, 0: the app doesn’t have the feature
• E.g., chat app: contacts, storage, calendar[1, 1, 0] – Classifier: feed forward neural network
𝐹𝐹 𝑋𝑋 = 𝐹𝐹 𝑋𝑋 , 𝐹𝐹 (𝑋𝑋) , 𝐹𝐹 𝑋𝑋 + 𝐹𝐹 𝑋𝑋 = 1, 0: benign, 1: malicious 0101
Benign,if𝐹𝐹0 𝑋𝑋 >𝐹𝐹1 𝑋𝑋 Malicious, otherwise
. …… .
COMP90073 Security Analysis
Evasion attacks (application)
• Attacking malware classifier for mobile phones [7]
– Attack goal: make a malicious application classified as benign – Limit: only add features to avoid destroying app functionalities – For each iteration:
𝜕𝜕𝐹𝐹 (𝑋𝑋) 𝜕𝜕𝐹𝐹 (𝑋𝑋) 𝑘𝑘=𝑘𝑘
• Step 1: compute the gradient of F w.r.t. X:
𝜕𝜕𝑋𝑋 𝜕𝜕𝑋𝑋𝑗𝑗
𝑘𝑘∈ 0,1 ,𝑗𝑗∈[1,𝑑𝑑]
• Step 2: change the feature Xi to 1: (1) Xi = 0, (2) with the maximal positive gradientmaximise the change into the
target class 0
𝑖𝑖=argmax 𝐹𝐹 𝑋𝑋 𝑗𝑗∈ 1,𝑚𝑚 ,𝑋𝑋𝑗𝑗=0 0 𝑗𝑗
COMP90073 Security Analysis
Evasion attacks (application)
MWR: malware ratio MR: misclassification rate
COMP90073 Security Analysis
• Adversarial machine learning beyond computer vision – Audio
– Malware detection
• Why are machine learning models vulnerable?
– Insufficient training data
– Unnecessary features
• How to defend against adversarial machine learning?
– Data-driven defences
– Learner robustification • Challenges
COMP90073 Security Analysis
Locations of Adversarial Samples
• Locations of adversarial samples
– Offthedatamanifoldforthelegitimateinstances – Threescenarios[1]
Near the boundary, but far from the “+” manifold
Away from the boundary, but near the manifold – in the “pocket” of the “+” manifold
Close to the boundary and the “-” manifold
COMP90073 Security Analysis
Locations of Adversarial Samples
• Images that are unrecognisable to human eyes, but can be identified by DNNs with nearly certainty [2]
DNNs believe with 99.99% confidence that the above images are digits 0-9
COMP90073 Security Analysis
• Adversarial machine learning beyond computer vision – Audio
– Malware detection
• Why are machine learning models vulnerable?
– Insufficient training data
– Unnecessary features
• How to defend against adversarial machine learning?
– Data-driven defences
– Learner robustification • Challenges
COMP90073 Security Analysis
Explanation1: Insufficient Training Data
• Potential reason 1: insufficient training data • An illustrative example
– 𝑥𝑥∈ −1,1 ,𝑦𝑦∈ −1,1 ,𝑧𝑧∈ −1,2 – Binary classification
• Class 1: 𝑧𝑧 < 𝑥𝑥2 + 𝑦𝑦3
• Class 2: 𝑧𝑧 ≥ 𝑥𝑥2 + 𝑦𝑦3
– x, y, z are increased by 0.01 a total of 200×200×300
= 1.2×107 points
– How many points are needed to reconstruct the decision boundary?
COMP90073 Security Analysis
Explanation1: Insufficient Training Data
– Randomly choose the training and test datasets
Training dataset
Test dataset
– Boundary dataset (adversarial samples are likely to locate here): 𝑥𝑥2+𝑦𝑦3−0.1<𝑧𝑧< 𝑥𝑥2+𝑦𝑦3+0.1
COMP90073 Security Analysis
Explanation1: Insufficient Training Data
• Test result
Size of the training dataset
Accuracy on its own test dataset
Accuracy on the test dataset with 4×104 points
Accuracy on the boundary dataset
– LinearSVMs
Size of the training dataset
Accuracy on its own test dataset
Accuracy on the test dataset with 4×104 points
Accuracy on the boundary dataset
• 8000:0.067%of1.2×107
• MNIST:28×288-bitgreyscaleimages,
(28)28×28 ≈ 1.1 × 101888
• 1.1×101888 ×0.067% ≫ 6×105
COMP90073 Security Analysis
• Adversarial machine learning beyond computer vision – Audio
– Malware detection
• Why are machine learning models vulnerable?
– Insufficient training data
– Unnecessary features
• How to defend against adversarial machine learning?
– Data-driven defences
– Learner robustification • Challenges
COMP90073 Security Analysis
Poisoning attacks
• Poison frog attacks [10]
– E.g.,addaseeminglyinnocuousimage(thatisproperlylabeled)toa
training set, and control the identity of a chosen image at test time Target class
Base class
COMP90073 Security Analysis
Poisoning attacks
• Generate poison data
– 𝑓𝑓(𝑥𝑥): the function that propagates an input x through the
– 𝑝𝑝 = argmin 𝑓𝑓 𝑥𝑥 − 𝑓𝑓(𝑡𝑡) 2 + 𝛽𝛽 𝑥𝑥 − 𝑏𝑏 2
network to the penultimate layer (before the softmax layer)
• 𝑓𝑓 𝑥𝑥 − 𝑓𝑓(𝑡𝑡) : makes p move toward the target instance
•𝛽𝛽 𝑥𝑥−𝑏𝑏 :makespappearlikeabaseclassinstancetoa 2
in feature space and get embedded in the target class distribution
human labeller
COMP90073 Security Analysis
Explanation2: Unnecessary Features
• Potential reason 2: redundant features [3]
– Classifier𝑓𝑓=𝑙𝑙∘𝑐𝑐,𝑙𝑙:featureextraction,𝑐𝑐:classification
– 𝑑𝑑 : similarity measure
– FeaturesextractedbyMLclassifier(X1)≠Featuresextractedbyhuman(X2)
COMP90073 Security Analysis
Explanation2: Unnecessary Features
• Potential reason 2: redundant features [3]
𝐹𝐹𝑖𝑖𝐹𝐹𝑑𝑑 𝑥𝑥′
s.t.𝑓𝑓1 𝑥𝑥 ≠𝑓𝑓1 𝑥𝑥′
– Previousdefinitionofadversarialattacks:
∆(𝑥𝑥,𝑥𝑥′) < 𝜖𝜖
𝐹𝐹𝑖𝑖𝐹𝐹𝑑𝑑 𝑥𝑥′
s.t.𝑓𝑓1 𝑥𝑥 ≠𝑓𝑓1 𝑥𝑥′
– Newdefinition:
𝑑𝑑2(𝑙𝑙2(𝑥𝑥), 𝑙𝑙2(𝑥𝑥′)) < 𝛿𝛿2 𝑓𝑓2 𝑥𝑥 =𝑓𝑓2 𝑥𝑥′
– {𝛿𝛿2, 𝜂𝜂}-strong-robustness:
𝑖𝑖𝑓𝑓 ∀𝑥𝑥,𝑥𝑥′ ∈ 𝑋𝑋 𝑎𝑎.𝑒𝑒. 𝑥𝑥,𝑥𝑥′ 𝑠𝑠𝑎𝑎𝑡𝑡𝑖𝑖𝑠𝑠𝑓𝑓𝑖𝑖𝑒𝑒𝑠𝑠
𝑃𝑃 𝑓𝑓1 𝑥𝑥 =𝑓𝑓1 𝑥𝑥′ |𝑓𝑓2 𝑥𝑥 =𝑓𝑓2 𝑥𝑥′ ,𝑑𝑑2(𝑙𝑙2(𝑥𝑥),𝑙𝑙2(𝑥𝑥′))<𝛿𝛿2 >1−𝜂𝜂
𝑓𝑓1 agrees with 𝑓𝑓2
COMP90073 Security Analysis
Explanation2: Unnecessary Features
• Unnecessary features ruin strong-robustness
– If 𝑓𝑓 uses unnecessary featuresnot strong-robust
– If𝑓𝑓1missesnecessaryfeaturesusedby𝑓𝑓2notaccurate
– If𝑓𝑓1usesthesamesetoffeaturesas𝑓𝑓2strong-robust,canbeaccurate
Can be far away to the original instance in the trained classifier’s feature space, and at the other side of the boundary
Each adversarial sample is close to the original instance in the oracle feature space
COMP90073 Security Analysis
• Adversarial machine learning beyond computer vision – Audio
– Malware detection
• Why are machine learning models vulnerable?
– Insufficient training data
– Unnecessary features
• How to defend against adversarial machine learning?
– Data-driven defence
– Learner robustification • Challenges
COMP90073 Security Analysis
Data-driven Defence
• Data-driven defence
– Filteringinstances:poisoningdatainthetrainingdatasetortheadversarial samples against the test dataset either exhibit different statistical features, or follow a different distribution – detection
– Injectingdata:addadversarialsamplesintotraining–adversarialtraining
– Projectingdata:projectdataintolower-dimensionalspace;move adversarial samples closer to the manifold of legitimate samples
COMP90073 Security Analysis
Data-driven Defence: Filtering Instances
• Filteringinstances
• OnDetectingAdversarialPerturbations[4]
– Adversarydetectionnetwork:branchoffthemainnetworkatsomelayer
– Each detector produces 𝑝𝑝 : probability of the input being adversarial
– Step1:Trainthemainnetworkregularly,andfreezeitsweights
– Step2:Generateanadversarialsampleforeachtrainingdatapoint
– Step3:Trainthedetectorsonthebalanced,binarydataset
COMP90073 Security Analysis
Data-driven Defence: Filtering Instances
• Adaptive/dynamic attacker: attacker that is aware of the detection
cross-entropy loss of the classifier
letting the classifier mis-label the input x
cross-entropy loss of the detector
making the detectors output padv as small as possible
• Dynamic adversary training
Train the classifier
Freeze its weights Precompute adversarial samples
Compute adversarial examples on-the-fly for each mini-batch
Modify x only to maximise the classifier’s cross-entropy loss
Modify x to fool classifier + detector
Adapt to each other
COMP90073 Security Analysis
Data-driven Defence: Filtering Instances
• TestonCIFAR10
COMP90073 Security Analysis
Data-driven Defence
• Data-driven defence
– Filteringinstances:poisoningdatainthetrainingdatasetortheadversarial samples against the test dataset either exhibit different statistical features, or follow a different distribution – detection
– Injectingdata:addadversarialsamplesintotraining–adversarialtraining
– Projectingdata:projectdataintolower-dimensionalspace;move adversarial samples closer to the manifold of legitimate samples
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• Adversarial training: add adversarial samples into training data
• Towards Deep Learning Models Resistant to Adversarial Attacks [5] – Normallyhowaclassificationproblemisformalised
𝜃𝜃∗ =arg min 𝔼𝔼
𝜃𝜃 (𝑥𝑥,𝑦𝑦)~𝐷𝐷
𝐿𝐿 𝑥𝑥;𝑦𝑦;𝜃𝜃 Notrobust Augment
𝜃𝜃=argmin 𝔼𝔼 max𝐿𝐿𝑥𝑥+𝛿𝛿;𝑦𝑦;𝜃𝜃
adversary: perturb 𝑥𝑥 to maximise the loss
– Redefinethelossbyincorporatingtheadversary:
𝜃𝜃 (𝑥𝑥,𝑦𝑦)~𝐷𝐷
𝛿𝛿∈ −𝜀𝜀,𝜀𝜀 𝑑𝑑
Defender: find model parameters 𝜃𝜃∗ to minimise the “adversarial loss”
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• Towards Deep Learning Models Resistant to Adversarial Attacks [5] – Step1:fix𝜃𝜃,generateadversarialsamplesusingstrongattacks(e.g.,
projected gradient descent, C&W): 𝑥𝑥𝑖𝑖 ← clip𝜀𝜀 𝑥𝑥𝑖𝑖−1 + 𝛼𝛼 � sign 𝜕𝜕𝐿𝐿 𝜕𝜕𝑥𝑥𝑖𝑖−1
– Step2:Update𝜃𝜃:trainthenetworkontheaugmenteddataset // Only one epoch
Inner maximisation: find adversarial examples Outer minimisation: optimise 𝜃𝜃
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• Towards Deep Learning Models Resistant to Adversarial Attacks [5]
Potential problem?
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• CurriculumAdversarialTraining(CAT)[17]
– Adversarialtrainingoverfitstothespecificattackinuse
– Trainingcurriculum:trainamodelfromweakerattackstostrongerattacks – Attackstrength:PGD(k),k:thenumberofiterations
𝐹𝐹= |𝒟𝒟| batch size
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• CurriculumAdversarialTraining(CAT)[17] – Batchmixing
• Catastrophic forgetting [19]: a neural network tends to forget the information learned in the previous tasks when training on new tasks
• Generate adversarial examples using PGD(𝑖𝑖), 𝑖𝑖 ∈ {0, 1, … , 𝑙𝑙}, and combine them to form a batch, i.e., batch mixing
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• CurriculumAdversarialTraining(CAT)[17] – Quantization
• Attack generalisation: the model trained with CAT may not defend against stronger attacks
• Quantization: real valueb-bit integer
• Each input x: real value from [0, 1]d integer value from [0, 2b-1]d
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• Adversarialtrainingforfree[20]
– Trainoneachminibatchmtimes,numberofepochsNepNep/m
– FGSMisused,butperturbationsarenotresetbetweenminibatches – Singlebackwardpasstoupdatebothmodelweightsandperturbation
COMP90073 Security Analysis
Data-driven Defence: Injecting Data
• FastAdversarialTraining[18]
– FGSMadversarialtrainingwithrandominitialization
– Non-zeroinitialperturbationistheprimarydriverforsuccess
COMP90073 Security Analysis
Data-driven Defence
• Data-driven defence
– Filteringinstances:poisoningdatainthetrainingdatasetortheadversarial samples against the test dataset either exhibit different statistical features, or follow a different distribution – detection
– Injectingdata:addadversarialsamplesintotraining–adversarialtraining
– Projectingdata:projectdataintolower-dimensionalspace;move adversarial samples closer to the manifold of legitimate samples
COMP90073 Security Analysis
Data-driven Defence: Projecting Data
• Projecting data
– Adversarial samples come from low-density regions
– Move adversarial samples back to the data manifold before classification
– Use auto-encoder, GANs, PixelCNN to reform/purify the input
COMP90073 Security Analysis
Data-driven Defence: Projecting Data
• Auto-encoder: get an output identical with the input
Inputcodeoutput ≈ input
https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
COMP90073 Security Analysis
Data-driven Defence: Projecting Data
• MagNet:aTwo-ProngedDefenseagainstAdversarialExamples[6] – Useauto-encoderstodetectandreformadversarialsamples – Detector
• Reconstruction error (RE)
– Normal examplessmall RE
– Adversarial sampleslarge RE
– Threshold: reject no more than 0.1% examples in validation set
• Probability divergence
– Normal examplesSmall divergence btw 𝑓𝑓(𝑥𝑥) and 𝑓𝑓 𝐴𝐴𝐴𝐴 𝑥𝑥
– Adversarial samplesLarge divergence btw 𝑓𝑓(𝑥𝑥′) and 𝑓𝑓 𝐴𝐴𝐴𝐴 𝑥𝑥′
𝐴𝐴𝐴𝐴 𝑥𝑥 : output of the auto-encoder
𝑓𝑓(𝑥𝑥) : output of the last layer (i.e., softmax) of the neural network f on the input x
COMP90073 Security Analysis
Data-driven Defence: Projecting Data
– Reformer
• Normal examples: AE outputs a very similar example
• Adversarial samples: AE outputs an example that is closer to the manifold of the normal examples
Manifold of normal examples
Normal examples
Adversarial samples
COMP90073 Security Analysis
Data-driven Defence: Projecting Data
– Reformer
COMP90073 Security Analysis
Data-driven Defence: Projecting Data
• Can you think of a way to break “MagNet”?
– Hint: an adaptive attacker that attacks not only the classifier, but also the detector (suppose there is only one detector) and the reformer.
–argmin 𝛿𝛿 +𝑐𝑐�𝑓𝑓𝑥𝑥+𝛿𝛿?? 𝛿𝛿∈ 0,1 𝑑𝑑
COMP90073 Security Analysis
• Adversarial machine learning beyond computer vision – Audio
– Malware detection
• Why are machine learning models vulnerable?
– Insufficient training data
– Unnecessary features
• How to defend against adversarial machine learning?
– Data-driven defences
– Learner robustification • Challenges
COMP90073 Security Analysis
Learner Robustification: Distillation
• Distillation as a defense to adversarial perturbations against deep neural networks [8]
– Distillation:transferknowledgefromoneneuralnetworktoanother– suppose there is a trained DNN, the probabilities generated in the final softmax layer are used to train a second DNN, instead of the (hard) class labels
provide richer information about each class
COMP90073 Security Analysis
Learner Robustification: Distillation
• Distillation as a defense to adversarial perturbations against deep neural networks [8] (N. Papernot et al.)
– Modificationtothefinalsoftmaxlayer: 𝑖𝑖𝑇𝑇
𝐹𝐹𝑋𝑋= 𝑡𝑡𝑧𝑧(𝑋𝑋) →𝐹𝐹𝑋𝑋= 𝑡𝑡𝑧𝑧𝑖𝑖(𝑋𝑋)
𝑖𝑖 ∑𝑁𝑁 𝑡𝑡𝑧𝑧𝑗𝑗(𝑋𝑋) 𝑖𝑖 𝑧𝑧𝑗𝑗(𝑋𝑋)
𝑗𝑗=1 ∑𝑁𝑁 𝑡𝑡 𝑇𝑇
𝑗𝑗=1 𝑍𝑍(𝑋𝑋): output of the last hidden layer
𝑇𝑇: 𝑑𝑑𝑖𝑖𝑠𝑠𝑡𝑡𝑖𝑖𝑙𝑙𝑙𝑙𝑎𝑎𝑡𝑡𝑖𝑖𝑙𝑙𝐹𝐹 𝑡𝑡𝑒𝑒𝑡𝑡𝑝𝑝𝑒𝑒𝑡𝑡𝑎𝑎𝑡𝑡𝑡𝑡𝑡𝑡𝑒𝑒
COMP90073 Security Analysis
Learner Robustification: Distillation
• Distillation as a defense to adversarial perturbations against deep neural networks [8]
– Given a training set {(X, Y(X))}, train a DNN (F) with a softmax
layer at temperature T
– Form a new training set {(X, F(X))}, train another DNN (FD),
with the same network architecture, also at temperature T
– Test at temperature T=1
– A high empirical value of T (at training time) gives a better performance (T=1 at test time)
– FD provides a smoother loss function – more generalised for an unknown dataset
Probability vector
COMP90073 Security Analysis
Learner Robustification: Distillation
– ResultsonMNISTandCIFAR10
Effect against adversarial samples Influence of distillation on clean data
COMP90073 Security Analysis
Learner Robustification: Distillation
• Whydoes“alargetemperatureattrainingtime(e.g.T=100)+alow temperature at test time (e.g. T=1)” make the model more secure?
𝑡𝑡𝑧𝑧𝑖𝑖(𝑋𝑋)
𝑧𝑧𝑗𝑗(𝑋𝑋) ∑𝑁𝑁 𝑡𝑡 𝑇𝑇
200 𝑡𝑡2 𝑡𝑡 𝑡𝑡200
Training (T=100) Test (T=1) 𝑡𝑡1 𝑡𝑡100 1
𝑡𝑡+𝑡𝑡2+𝑡𝑡=1+𝑡𝑡+1 𝑡𝑡100+𝑡𝑡200+𝑡𝑡100=1+𝑡𝑡
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com