Splunk Introduction
COMP90073 Security Analytics
Dr. , CIS Semester 2, 2021
COMP90073 Security Analytics © University of Melbourne 2021
Outline
• WhatisSplunk&WhySplunk
• SplunkSoftware
• SearchProcessingLanguage(SPL)
COMP90073 Security Analytics © University of Melbourne 2021
What is Splunk & Why Splunk
A software for searching, monitoring, and analysing machine generated big data using a web-style interface
A typical web server log
Challenging to analyse multiple logs in real-time to detect security events!
COMP90073 Security Analytics © University of Melbourne 2021
What is Splunk & Why Splunk
Gartner 2020 Magic Quadrant for Security Information and Event Management (SIEM)
• Advancedthreatdetectionand response solution
– Userandentitybehavior analytics (UEBA)
– Endpointdetectionand response (EDR)
– Automatedthreatintelligence
– Real-timedashboardsand
reports
– Andmore…
COMP90073 Security Analytics © University of Melbourne 2021
Splunk Software
• SplunkCapabilities
• SplunkArchitecture
• WhatCanbeIndexed
• WebInterfaceOverview
• Search&Reporting
• Events&Fields
• DefaultFields
• DataType&CommonOperators
COMP90073 Security Analytics © University of Melbourne 2021
Splunk Capabilities
• Collect,index,andcorrelatemachinedatainreal-time
– Indexing:transformingdataintoaseriesofeventsthatcontainsearchable
fields (e.g. IP addresses of source and destination in a network packet) • Index:ArepositoryforSplunkdata
• Generategraphs,reports,alerts,dashboardsandvisualizations
COMP90073 Security Analytics © University of Melbourne 2021
Splunk Architecture
• Datasources:logs,filesystems,Netflow,etc.
• Splunkforwarders:forwardsthedatafromdifferent
data input sources to the indexers
• Splunkindexers:createsandmanagesindexesfor the incoming data
Search Tier
Indexers Tier
Forwarders Tier
Data Sources
Indexer
Indexers
Forwarders with load balancing
Forwarders
• Splunksearchtier:includessearchheadsthat process the search queries from users on the indexed data
COMP90073 Security Analytics © University of Melbourne 2021
What Can be Indexed
COMP90073 Security Analytics © University of Melbourne 2021
Web Interface Overview
Splunk bar
Manage and run applications
Add forwarders or import data from file
Add custom dashboards for data visualisation
COMP90073 Security Analytics © University of Melbourne 2021
Search & Reporting
Search bar
Time range picker
Summary of indexed data
Rerun past searches
COMP90073 Security Analytics © University of Melbourne 2021
Event & Fields
Search command
Data Source: https://live.splunk.com/splunk-security-dataset-project
COMP90073 Security Analytics © University of Melbourne 2021
Default Fields
• Shellscripts,pythonscripts,Windowsbatchfiles,PowerShell,etc.,canbe used to customise the data indexing and generate useful fields
•
Type of field
List of fields
Internal fields:
Contain general information about events
Original raw data of an event
An event’s timestamp expressed in Unix time The time that an event was indexed
An address for an event within the index
The bucket that an event is stored in
_raw
_time
_indextime
_cd
_bkt
Description
COMP90073 Security Analytics © University of Melbourne 2021
Default Fields
Type of field
List of fields
Default fields:
Contain information about where an event originated
Hostname/IP address of the device that generated the event (e.g., cisco_router)
The name of the index in which a given event is indexed (e.g., default is “main”)
The number of lines an event contains
The punctuation pattern that is extracted from an event
The file, stream, or other input from which an event originates (e.g., stream:http)
The format of the data input from which the event originates (e.g. syslog)
The Splunk server containing the event An event’s timestamp value
host
index
linecount
punct
source
sourcetype
splunk_server
timestamp
Description
COMP90073 Security Analytics © University of Melbourne 2021
Default Fields
Type of field
Default datetime fields:
Contain additional searchable granularity to event timestamps
The hour in which an event occurred
The day of the month on which an event occurred
The minute in which an event occurred The month in which an event occurred
The seconds portion of an event’s timestamp
The day of the week on which an event occurred
The year in which an event occurred
The value of time for the local time-zone of an event
List of fields
date_hour
date_mday
date_minute
date_month
date_second
date_wday
date_year
date_zone
Description
COMP90073 Security Analytics © University of Melbourne 2021
Data Types & Common Operators
• Datatypes:bool,int,float,string
• Comparison operators: = != < <= > >= • Logicaloperators:AND,OR,NOT
– Clause“src_port!=80”isdifferentfrom“NOTsrc_port=80”
• Records with missing value of “src_port” field are returned in the
second clause but are not returned in the first one
– Ifnologicaloperatorisusedbetweenclauses,thedefaultoperatorisAND
• “src_port !=80 host=server01” is equivalent to “src_port !=80 AND host=server01”
COMP90073 Security Analytics © University of Melbourne 2021
Search Processing Language (SPL)
• FilteringResults
• Sorting&GroupingResults • Filtering&ModifyingFields
COMP90073 Security Analytics © University of Melbourne 2021
Common SPL Commands – Pipe
• Common search string in SPL: command1 | command2 | … | commandk
• Results after the pipe character “|” are used as input for its following command
• The pipe character is always followed by an SPL command
.
.
.
Command1 Command2 . . . Commandk
COMP90073 Security Analytics © University of Melbourne 2021
Common SPL Commands
• “search”commandisimplicitlyappliedinthebeginningofthesearchpipeline and you should not use it explicitly in this location
– Example:“src_port=80|topdest_ip”
“search” command is implicitly applied here
Category Description Commands
Filtering Taking a set of results and filtering search, where, dedup, Results them into a smaller set of results head, tail
Sorting Results
Reporting Results
Ordering (and optionally limiting the number of) results
Generating a summary of results for reporting
sort
top/rare, table, stats,
chart, timechart
Grouping Grouping events for identifying transaction Results patterns
Filtering, Modifying, and Adding Fields
Filtering out some fields to focus on fields, replace, rename, most related ones, modifying or eval, rex, lookup
adding fields to enrich results
Source: https://docs.splunk.com/
COMP90073 Security Analytics © University of Melbourne 2021
Common SPL Commands – Syntax Tips
• Requiredargumentsareshowninanglebrackets<>
• Optionalargumentsareenclosedinsquarebrackets[] • Groupargumentsareshowninparenthesis()
• Repeatingargumentsareshownbyellipsis…
• Example
– Syntax:replace(
– Example:replace200WITHOK404WITH“NotFound”INstatus
HTTP status field in indexed data
COMP90073 Security Analytics © University of Melbourne 2021
Filtering the Results
COMP90073 Security Analytics © University of Melbourne 2021
Search command
• FilterseventsfromSplunkindexesgivenasetofqueriedconditions
• Syntax:search
• logical-expression
– comparison-expression
– index-expression
– time-opts Youcanalsousethetimerangepickerfortimeoptions
• Precedenceoflogicaloperatorsinsearchcommand:expressionswith parenthesis, then NOT then OR then AND
COMP90073 Security Analytics © University of Melbourne 2021
Search command: comparison-expression
•
– Examples: src_port < 100, src_ip=192.168.10.1
•
– Example: dest_port IN (21,80,8080)
– IN operator checks if a value is a member of a group of values
• Search command examples for the toy HTTP data: – search status >= 400
• Returns events with error in HTTP requests – search status IN (401,403)
• Returns events with unauthorized or Forbidden HTTP requests
COMP90073 Security Analytics © University of Melbourne 2021
Search command: index-expression
• “
– Keywords or quoted phrases to match, Examples: fail*, login, “http://”
• Wildcard: asterisk wildcard (*) character is used to match an unrestricted number of characters in a string
•
–
– Example: sourcetype=syslog
• Searchexample:
– search sourcetype=stream:http fail* password
• Thisisequivalentto“searchsourcetype=stream:httpANDfail*ANDpassword”
COMP90073 Security Analytics © University of Melbourne 2021
Time unit
Valid unit abbreviations
second
s, sec, secs, second, seconds
Search command: time-opts
• [
– timeformat=…
– Example:timeformat=%d/%m/%Y:%H:%M:%S – Defaulttimeformatis%m/%d/%Y:%H:%M:%S
•
– earliest,latest,_index_earliest,_index_latest,now(),time() –
–
• Hint:youcanusethewebinterfaceforsettingthetimeoptions
minute
m, min, minute, minutes
hour
h, hr, hrs, hour, hours
day
d, day, days
week
w, week, weeks
month
mon, month, months
quarter
q, qtr, qtrs, quarter, quarters
year
y, yr, yrs, year, years
COMP90073 Security Analytics © University of Melbourne 2021
Tips for search command
• Fieldnamesarebydefaultcase-sensitive
• Literalsarenotcasesensitivebydefault
– Example:searchingforlogin,Login,or”Login”allreturnsameresults – UseCASE(
• CASE(Login) only returns events that include Login (not login)
• Splunksearchesforwholeword
– Searchresultsfor“fail”and“failure”☛useasteriskwildcard(*)☛fail*
• Forphrasesorfieldvaluescontainingbreakingcharacters,e.g.,whitespace,
commas, pipes, square brackets and equal sign use quotation marks
– Examples:host=“server1”
– Usebackslash(\)toscapequoteinthefiledvalue,e.g.,host=“server\”1” → looking for records with host name equal to
COMP90073 Security Analytics © University of Melbourne 2021
Where command
• Quotedstringsareinterpretedasliterals
• Unquoted strings are treated as a field name Compare two different fields
• CanalsobeusedwithINoperatorandavalue-list – Example:…|wheredest_portIN(80,8080)
• Precedenceoflogicaloperatorsinwhere:expressionswithparenthesis,then NOT then AND then OR
• Examples
– … | where src_port=dst_port
– … | where bytes_in>2*bytes_out
COMP90073 Security Analytics © University of Melbourne 2021
Head and tail commands
• Headreturnsthemostrecentresultsofasearch – … | head 25
• Tailreturnstheearliestresultsofasearch – …|tail15
• Iftheintegerargumentisnotgiven,bothcommandsreturn10resultsbydefault status>400 | tail 20
status>400 | head 20
BOTS: https://live.splunk.com/splunk-security-dataset-project
COMP90073 Security Analytics © University of Melbourne 2021
Sorting & Grouping Results
COMP90073 Security Analytics © University of Melbourne 2021
Sort command
• Tochangetheordering/numberoftheresults
• Syntax:sort[
• Defaultvalueoftheoptionalfieldcountis10,000;pass0toreturnalltheresults
• sort-by-clause:[±]
– The value of sort-filed can be a field (such as “src_port”) or
• auto(
• ip(
• Defaultsortingorderisascending
– Use minus sign for descending order, e.g., sort –src_port, +ip(src_ip)
• Examples:
– … | sort lastname, -firstname
– … | sort 100 -num(size), +str(source)
COMP90073 Security Analytics © University of Melbourne 2021
Transaction command
• Groupofconceptually-relatedeventsthatspanstime – Examples
• Different events from the same source and the same host
• Different events from different sources but from the same host • Similar events from different hosts and different sources
• A set of events related to a firewall intrusion incident
• Syntax:transaction[
• Thiscommandaddstwofieldstotherawevents:durationandeventcount
• Theargumentfield-listspecifiesonefieldormorefieldnamestogroupevents
into transactions based on the values of the field(s)
– Therelationshipamongthefieldscanbeconjunction,disjunction, transitive, …
COMP90073 Security Analytics © University of Melbourne 2021
Transaction command: transaction definition options
• transaction-definition-options
– endswith=
• To start or end a transaction if the filter-string is satisfied by an event
– maxspan=
• Events in the transaction must span less than integer specified for maxspan. Events that exceed the maxspan limit are treated as part of a separate transaction
– maxpause=
• To specify the maximum length of time for the pause between the
events in a transaction
– maxevents=
• To specify the maximum number of events in a transaction. The default value is1000.
– Anegativevalueforeachoftheseconstraintsmeansthatthereisnolimit on the its value
COMP90073 Security Analytics © University of Melbourne 2021
Transaction command: example
status>400 | transaction maxpause=1m src_ip,dest_ip | sort -eventcount
The source 40.80.148.42 is scanning the destination 192.168.250.70??
Acunetix is a vulnerability scanner
BOTS: https://live.splunk.com/splunk-security-dataset-project
COMP90073 Security Analytics © University of Melbourne 2021
Reporting Results
COMP90073 Security Analytics © University of Melbourne 2021
Commands for statistical calculations
• Calculateaggregatestatistics(average,count,sum,…)overaresultsset • Commands
– stats:returnsatableofresultswhereeachrowrepresentsasingleunique combination of the values grouped by a set of chosen fields
• See others: eventstats, streamstats, geostats
– chart:similartostatsbutcreatestabulardataoutputsuitableforcharting
– timechart:createsachartforastatisticalaggregationappliedtoafield against time as the x-axis
COMP90073 Security Analytics © University of Melbourne 2021
Stats command
•
Syntax: stats [partitions=
(
Lower-case “or” in these slides is used to show alternative available options
stats-agg-term:
– Choices of stats-func → next slide
•
– Input field argument can be an existing field-name (e.g., src_port) or evaled- field created using eval command inside stats
• stats count(eval(src_port=80)) → evaled-field is “eval(src_port=80)” – Wildcard field names can be used: this option returns separate results
applying stats-func on each field: stats count(eval(*_port=80))
– The optional argument [AS
and can be wildcard field names:
• Example 1: “stats count(eval(*_port=80)) AS *_port80”
COMP90073 Security Analytics © University of Melbourne 2021
Options for stats-func
Type of function
Supported functions and syntax
Aggregate functions
avg()
count() distinct_count() estdc() estdc_error()
exactperc
median()
min()
mode()
perc
sum()
sumsq() upperperc
varp()
Event order functions
first()
last()
Multi-value stats and chart functions
list()
values()
Time functions
earliest() earliest_time()
latest() latest_time()
rate()
More detail on the functions:
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonStatsFunctions
COMP90073 Security Analytics © University of Melbourne 2021
Stats command (example)
Execution per src_ip:
1. eval(if(status>=400,1,0))
2. stats command sums over the output of eval splitting by source IP address 3. sort command sorts the results
0 1
… | stats sum(eval(if(status>=400,1,0))) AS statusError BY src_ip | sort – statusError
Status Error for this source IP is much higher than others
BOTS: https://live.splunk.com/splunk-security-dataset-projectCOMP90073 Security Analytics © University of Melbourne 2021
Stats command (example)
COMP90073 Security Analytics © University of Melbourne 2021
Stats command: sparkline-agg-term
• Sparkline:aninlinechartthatappearswithintablecellsinsearchresultsto display time-based trends associated with the primary key of each row
• Syntax:sparkline(
– sparkline-funcoptions:count(),mean(),avg(),stdev(),min(),max(),etc. – span-lengthexamples:1d,10min,1mon
Example: index=* | stats sparkline(avg(bytes_*),1m) AS avg_bytes_* BY src_ip,dest_ip
These lines change as the search proceeds
BOTS: https://live.splunk.com/splunk-security-dataset-project
COMP90073 Security Analytics © University of Melbourne 2021
Stats command: other arguments
• partitions=
• allnum=
all of the values of that field are numerical
• delim=
COMP90073 Security Analytics © University of Melbourne 2021
Chart command
Syntax: chart (
[( BY
• row-split
–
– bin-options:bins,span,…
• Examples: bins=5, span=1min, … • column-split
–
– tc-options:
COMP90073 Security Analytics © University of Melbourne 2021
Compare stats and chart commands
chart count(eval(src_port=80)) AS port80 OVER dest_port bins=10 BY dest_ip
dest_port
10.120.137.110
10.120.251.250
…
10.186.60.244
10.85.245.109
OTHER
0-10000
590
566
… .
417
453
139639
10000-20000
.
25
.
17
.
7
.
14
.
3309
.
60000-70000
4
7
…
8
4
1378
stats count(eval(src_port=80)) AS port80 BY dest_port, dest_ip
dest_port
dest_ip
port80
80
10.168.80.39
171
80
10.122.27.216
161
80
10.122.68.227
161
80
10.120.137.110
159
…
COMP90073 Security Analytics © University of Melbourne 2021
Top and rare commands
• top [
– Mostcommon(optionallyN)valuesforthefields – Example:“topsrc_ipdest_ip”
• rare [
– Leastcommon(optionallyN)valuesforthefields
• Twofieldsareaddedtoeventswhenusingtopandrare:countandpercentage • Optionalby_clauseisforgroupingandorderingtheresultsusingotherfields
top src_ip dest_ip dest_port top src_ip dest_ip by dest_port
src_ip
dest_ip
dest_port
count
percent
40.80.148.42
192.168.250.70
80
5931
0.816
23.22.63.114
192.168.250.70
80
1236
0.170
40.80.148.42
192.168.250.40
8000
100
0.014
dest_port
src_ip
dest_ip
count
percent
80
40.80.148.42
192.168.250.70
5931
0.828
80
23.22.63.114
192.168.250.70
1236
0.172
8000
40.80.148.42
192.168.250.40
100
100
COMP90073 Security Analytics © University of Melbourne 2021
Top and rare commands: options
• showcount=
• countfield=
• showperc=
• percentfield=
• limit=
• useother=
• otherstr=
COMP90073 Security Analytics © University of Melbourne 2021
Table command
• table
– Example:…|table*_ip*_port
dest_ip
src_ip
dest_port
src_port
192.168.250.40
192.168.250.100
8089
49772
192.168.250.40
192.168.250.100
8.8.8.8
192.168.250.40
53
53273
8.8.8.8
192.168.250.40
53
53273
8.8.8.8
192.168.250.40
53
42173
8.8.8.8
192.168.250.40
53
42173
COMP90073 Security Analytics © University of Melbourne 2021
Filtering, Modifying & Adding Fields
COMP90073 Security Analytics © University of Melbourne 2021
Eval command
• Calculatesthevalueofanewfieldbasedonotherfields,whethernumerically,
by concatenation, or through Boolean logic
The double quotation sign means mandatory use of comma
• Syntax:eval
•
– Iftheexpression
• refers to field names with non-alphanumeric characters, the name
should be in single quotation marks (e.g., ‘src_port’)
• refers to literal strings, they should be in double quotation marks
• Theoutputisstoredin
– Ifthefieldalreadyexists,evaloverwritesthecorrespondingfieldvalues
– ThereturnedfieldvaluesbyevalcannotbeBoolean(tostring()function can be used to convert results to string)
COMP90073 Security Analytics © University of Melbourne 2021
Functions for eval expressions
Type of function
Supported functions and syntax
Comparison and Conditional functions
case(X,”Y”,…) cidrmatch(“X”,Y) coalesce(X,…) false()
if(X,Y,Z)
in(VALUE-LIST)
like(TEXT, PATTERN) match(SUBJECT, “REGEX”) null()
nullif(X,Y) searchmatch(X) true() validate(X,Y,…)
Conversion functions
printf(“format”,arguments)
tonumber(NUMSTR,BASE)
tostring(X,Y)
Cryptographic functions
md5(X) sha1(X)
sha256(X)
sha512(X)
Date and Time functions
now() relative_time(X,Y)
strftime(X,Y) strptime(X,Y)
time()
More detail on the functions:
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
COMP90073 Security Analytics © University of Melbourne 2021
Functions for eval expressions
Type of function
Supported functions and syntax
Informational functions
isbool(X) isint(X) isnotnull(X)
isnull(X) isnum(X)
isstr(X) typeof(X)
Mathematical functions
abs(X) ceiling(X) exact(X) exp(X)
floor(X) ln(X) log(X,Y) pi()
pow(X,Y) round(X,Y) sigfig(X) sqrt(X)
Multi-value eval functions
commands(X) mvappend(X,…) mvcount(MVFIELD) mvdedup(X)
mvfilter(X)
mvfind(MVFIELD,”REGEX”) mvindex(MVFIELD,STARTINDEX,ENDINDEX) mvjoin(MVFIELD,STR)
mvrange(X,Y,Z) mvsort(X) mvzip(X,Y,”Z”) split(X,”Y”)
More detail on the functions:
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
COMP90073 Security Analytics © University of Melbourne 2021
Functions for eval expressions
Type of function
Supported functions and syntax
Statistical eval functions
max(X,…)
min(X,…)
random()
Text functions
len(X) lower(X) ltrim(X,Y) replace(X,Y,Z)
rtrim(X,Y) spath(X,Y) substr(X,Y,Z) trim(X,Y)
upper(X) urldecode(X)
Trigonometry and Hyperbolic functions
acos(X) acosh(X) asin(X) asinh(X) atan(X)
atan2(X,Y) atanh(X) cos(X) cosh(X) hypot(X,Y)
sin(X) sinh(X) tan(X) tanh(X)
More detail on the functions:
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
COMP90073 Security Analytics © University of Melbourne 2021
Eval command examples
• Createanewfieldthatcontainstheresultofacalculation – …|evalvelocity=distance/time
• Usetheiffunctiontoanalysefieldvalues
– …|evalerror=if(status==200,”OK”,”Problem”)
• Convertvaluestolowercase
– …|evallowuser=lower(username)
• Calculatethesumoftheareasoftwocircles
– …|evalsum_of_areas=pi()*pow(radius_a,2)+pi()*pow(radius_b,2)
• Concatenatevaluesfromtwofields
– …|evalfull_name=first_name+””+last_name
• Separatemultipleevaloperationswithacomma
– …|evalfull_name=last_name+”,”+first_name,low_name=lower(full_name)
COMP90073 Security Analytics © University of Melbourne 2021
Eval command examples
default
BOTS: https://live.splunk.com/splunk-security-dataset-projecCtOMP90073 Security Analytics © University of Melbourne 2021
Replace and rename commands
• Syntax:replace(
– Example:replacejan*WITHJansat*WITHSatINdate_month,date_wday
• Syntax:rename
– Example:renamesrc_*ASsource_*dest_*ASdestination_*
COMP90073 Security Analytics © University of Melbourne 2021
Fields command
• Addsorremovesfieldsfromsearch • Syntax:fields±
• Examples:
– … | fields – src_port
– “fields–src_port,dst_port”isequivalentto“fields–*_port”
• Incombinationwitheval,fieldscommandcanbeusedtoshowinternalfields
– …|fields+_bkt|evalbkt=_bkt
COMP90073 Security Analytics © University of Melbourne 2021
Rex command
• Rexcommandusesregularexpressionstocreatenewfieldsbasedon extracting patterns in other fields
• Syntax:rex[field=
• Thefieldargumentis_rawbydefault,andspecifiesthefieldfromwhichthe
new field(s) will be extracted
• regex-expressionisaregularexpression
• Example:extractIPaddress
A field named ip is created for events that have this pattern in their raw data
– … | rex field=_raw “.*(?
– … | rex field=src_ip “\d+\.\d+\.\d+\.(?
A field named octet is created for events that have the src_ip field
| stats min(octet) as minOctet max(octet) as maxOctet | eval octetRange=”[“.minOctet.”,”.maxOctet.”]”
The new minOctet and maxOctet fields Dot is used to join the results as string: calculated using stats command can
be used to find the range of the last octet in the observed IP address
minOctet
maxOctet
octetRange
1
253
[1,253]
COMP90073 Security Analytics © University of Melbourne 2021
Regex command (filtering command)
• Regexcommandusesregularexpressionstofiltersearchresults(itdoesnot create new fields)
• Syntax:regex(
– regex“^168\.\d+\.\d+\.\d+”
– regexsrc_ip!=”^168\.\d+\.\d+\.\d+”|statsvalues(src_ip)
• Practice! modify this command to filter private IP addresses!
values returns the list of observed values in the returned src_ip results
COMP90073 Security Analytics © University of Melbourne 2021
Summary
• SplunkSoftware
– UnderstandSplunkarchitectureandwhatcanbeindexed
– FamiliarwithEvents&Fields,DefaultFields,DataType&Common Operators
• SearchProcessingLanguage(SPL) – DevelopskillstouseSPLfor
• Filtering Results
• Sorting & Grouping Results • Filtering & Modifying Fields
COMP90073 Security Analytics © University of Melbourne 2021
References
1. https://www.splunk.com/
2. http://dev.splunk.com/view/dev-guide/SP-CAAAE3A
3. Exploring Splunk – Search Processing Language (SPL) Primer & Cookbook,
COMP90073 Security Analytics © University of Melbourne 2021