Step 0: Where to put my data (CSV) files?
$SPLUNK_HOME\etc\apps\Splunk_ML_Toolkit\lookups
Step 1: Load the file into MLTK
Go to App: Splunk Machine Learning Toolkit -> Search
Copyright By PowCoder代写 加微信 powcoder
Then type-in the following command to load the file
| inputlookup app_usage.csv // load the file
Then you can inspect all contents in the
COMP90073 ý University of Melbourne S2, 2021
Step 2: Apply preprocessing steps
In this example, we apply StandardScaler to the 4 features (aka. columns, fields, etc.) in
Command: (attaching to the previous one, separated by bars ¡°|¡±)
| fit StandardScaler “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”
with_mean=true with_std=true // apply preprocessing steps
After preprocessing, the processed fields will have ¡°SS_¡± as the prefix.
Here with_mean=true with_std=true makes the final scaled features to fall under N(0, 1)
[Normal distribution with ¦Ì = 0, ¦Ò2 = 1].
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/Preprocessing
The top 5 commonly applied preprocessing methods/functions are:
1. FieldSelector
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#FieldSelector
2. KernelPCA
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#KernelPCA
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#PCA
4. StandardScaler
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#StandardScaler
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#TFIDF
COMP90073 ý University of Melbourne S2, 2021
Step 3: Apply clustering algorithm
In this example, we apply KMeans to cluster those 4 features: “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”. In this case, we choose k=3 as the parameter.
Command: (attaching to all the previous commands, separated by bars ¡°|¡±)
Then, after the training is started/finished, you will see the model ¡°my_test_kmeans_model¡± in: App: Splunk Machine Learning Toolkit -> Models
| fit KMeans k=3 “SS_CloudDrive” “SS_Recruiting” “SS_RemoteAccess”
“SS_Webmail” into “my_test_kmeans_model” // train the KMeans model
Step 4: Evaluate and visualize the result
When the KMeans model training is finished, we can apply the model to yield the cluster information and visualize them. Overall, the full SPL command will be:
| inputlookup app_usage.csv
| apply “my_test_kmeans_model”
| eval cluster= “Cluster: ” + cluster
Optionally, in preprocessing (Step 2), you can use Then, replace the second command by:
| apply “app_usage_SS”
| fit StandardScaler “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”
with_mean=true with_std=true
| table cluster, “SS_CloudDrive”, “SS_Recruiting”, “SS_RemoteAccess”,
“SS_Webmail” // display the selected variables
| fit StandardScaler “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”
with_mean=true with_std=true into “app_usage_SS”
COMP90073 ý University of Melbourne S2, 2021
The result will be:
Here, we have 16 plots with field-field information showing all 3 clusters.
Note: In DBSCAN, Cluster -1 contains all the outliers (aka. anomalies).
COMP90073 ý University of Melbourne S2, 2021
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com