COMP4337/9337: Securing Fixed and Wireless Networks
Real-time Detection of DNS Exfiltration from Enterprise Networks
Domain Name System (DNS) • People: many identifiers:
Copyright By PowCoder代写 加微信 powcoder
• Name, passport #
• Internet hosts:
• IP address (32 bits)
• “name”, e.g., ww.yahoo.com – used by humans
Q: Map between IP addresses and name ?
Domain Name System (DNS)
v Used as a covert channel
– Firewalls – not blocking any malicious traffic
v DNS Attacks
-Reputation based — DNS Exfiltration
– Target the enterprise servers – DDoS
attacks and Reflection attacks etc.
v DNS Attacks are real and growing
– IDC 2019 Global DNS Threat Report – DNS attacks cost average of $1.07 Million to
organizations
– A jump of 49% from last year
– 34% increase in DNS attacks since 2018
– Observed shift from volumetric to low signal, including phishing and malware-based
High-Profile DNS Related Breaches/Malware
Ø breach
Ø FrameworkPOS malware
Ø BernhardPOS malware
Ø MULTIGRAIN malware
Ø Win32.Backdoor.Denis
Ø UDPoS Malware
DNS Exfiltration
Attacker controller server- thief.com (C&C)
C&C commands
NameMarySmith.foo.thief.com MRN100045429886.foo.thief.com DOB10191952.foo.thief.com
INTERNET ENTERPRISE
DNS server
Infected endpoint
NameMarySmith.foo.thief.com MRN100045429886.foo.thief.com DOB10191952.foo.thief.com
DNS Exfiltration (Data Breach) Setup
Compromised Laptop or Data Exfiltration Methods
1. A simple file or encrypted file
DNS Exfiltration Success
Recursive DNS
2. Using Base64 Encoding and pipe individual lines to
Authoritative DNS Server askdj.party
3. Perform query to these new non- existence domain
a domain 4. askdj.party name server to log all the
Example: queries to it and 54686520446f6d61696e204e616d6520537 converts back to a file 97374656d2028444e53292069.askdj.party 732074686520496e7465726e6574e280997 3207374616e64617264206e61.askdj.party
Gaps in previous works
v Detection algorithms are based on the analysis of last n number of hours
v No insights to the developed machine learning algorithms and the significance of
features used
Novelty in our approach:
To analyze real DNS traffic from the two organizations over 7 days, and extract stateless attributes, such as length, entropy, dots etc.
Tune, and train a machine learning algorithm to detect anomalous DNS queries using a known dataset of benign domains as ground truth
Implement our scheme on live 10 Gbps traffic streams from borders of the two organizations, and test our methods by injecting more than a million malicious DNS queries
DNS Exfiltration
708001701462b7fae70d0a28432920436f70797269676874.20313938352d32303031204d696372.6f736f6
67420436f72702e0d0a0d0a0.433a5c54454d503e.cspg.pw
www.google.com
Our Proposed Model
Total count of characters in FQDN=1346 count of characters in sub-domain=1329 count of uppercase characters=0
count of numerical characters=1004 Entropy=32.9814
number of labels=63 maximum label length=468 average label length=241
Anomaly scorre==0.2.795
Our Dataset
v Data collection from UNSW and CSIRO networks
v Ethics approval has been granted
v Currently collected daily
v Collected from edge of network (where network connects the internet) v CSV format
Summary of our Dataset
Key Features
v Total length of a domain name
v Length of the subdomain
v Uppercase characters in domain name v Numerical characters in domain name v Dots in domain name
v Longest label in a domain name
v Average label in a domain name
v Entropy of domain name
Sample of DNS Queries with Unusual Length
Number of Characters in Query Name (FQDN)
Entropy of Query Name
Entropy of sample FQDNs
DNS Exfiltration Attack Generation
v Data Exfiltration Toolbox
v Python Script with two variable parameter
v Total Length of the query
v Maximum label within a query
v Generate the DNS queries with the total length varied from 50 to 210 and maximum label length of 30 to 63
v Exfiltrated data was a csv file containing 1000 credit cards details (fake)
Isolation Forest (iForest)
v Separating an instance from the rest of the instances
v Anomalies are few and different
v Input parameters that we used and optimized v Number of trees
v Height limit of trees
v Contamination rate
Training of iForest
Queries for popular and top-ranked domains
» Free dataset of top 1m » Everyday!
Alexa top 10K
» No longer publishes top 1m sites. The latest data available is from late 2016.
» Now they expose paid APIs (i.e. $25 for 10000 URLs)
Accuracy of the Detection Algorithm
Accuracy – Research Institute
Accuracy – University Campus
Real-Time Analysis
Domains classified in 1 sec ~ 1700
Acknowledgements
v This work was completed in collaboration with the Australian Defence Science and Technology Group (DSTG), Australia’s Academic and Research Network (AARNet) and CSIRO Data61 organizations.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com