编程辅导 COMP90073 ý University of Melbourne S2, 2021

Step 0: Where to put my data (CSV) files?
$SPLUNK_HOME\etc\apps\Splunk_ML_Toolkit\lookups
Step 1: Load the file into MLTK
Go to App: Splunk Machine Learning Toolkit -> Search

Then type-in the following command to load the file (included in MLTK):
| inputlookup app_usage.csv // load the file into MLTK Note: The bar (|) is necessary to add before the command!
Then you can inspect all contents in the .
COMP90073 ý University of Melbourne S2, 2021

Step 2: Apply preprocessing steps
In this example, we apply StandardScaler to the 4 features (aka. columns, fields, etc.) in : “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”. Also, we would like to make the final scaled features to fall under N(0, 1) [Normal distribution with ¦Ì = 0, ¦Ò2 = 1].
Command: (attaching to the previous one, separated by bars ¡°|¡±)
| fit StandardScaler “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”
with_mean=true with_std=true // apply preprocessing steps
After preprocessing, the processed fields will have ¡°SS_¡± as the prefix.
Here with_mean=true with_std=true makes the final scaled features to fall under N(0, 1)
[Normal distribution with ¦Ì = 0, ¦Ò2 = 1].
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/Preprocessing
The top 5 commonly applied preprocessing methods/functions are:
1. FieldSelector
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#FieldSelector
2. KernelPCA
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#KernelPCA
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#PCA
4. StandardScaler
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#StandardScaler
https://docs.splunk.com/Documentation/MLApp/latest/User/Algorithms#TFIDF
COMP90073 ý University of Melbourne S2, 2021

Step 3: Apply clustering algorithm
In this example, we apply KMeans to cluster those 4 features: “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”. In this case, we choose k=3 as the parameter.
Command: (attaching to all the previous commands, separated by bars ¡°|¡±)
Then, after the training is started/finished, you will see the model ¡°my_test_kmeans_model¡± in: App: Splunk Machine Learning Toolkit -> Models
| fit KMeans k=3 “SS_CloudDrive” “SS_Recruiting” “SS_RemoteAccess”
“SS_Webmail” into “my_test_kmeans_model” // train the KMeans model
Step 4: Evaluate and visualize the result
When the KMeans model training is finished, we can apply the model to yield the cluster information and visualize them. Overall, the full SPL command will be:
| inputlookup app_usage.csv
| apply “my_test_kmeans_model”
| eval cluster= “Cluster: ” + cluster
Optionally, in preprocessing (Step 2), you can use Then, replace the second command by:
| apply “app_usage_SS”
| fit StandardScaler “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”
with_mean=true with_std=true
| table cluster, “SS_CloudDrive”, “SS_Recruiting”, “SS_RemoteAccess”,
“SS_Webmail” // display the selected variables
| fit StandardScaler “CloudDrive”, “Recruiting”, “RemoteAccess”, “Webmail”
with_mean=true with_std=true into “app_usage_SS”
COMP90073 ý University of Melbourne S2, 2021

The result will be:
Here, we have 16 plots with field-field information showing all 3 clusters.
Note: In DBSCAN, Cluster -1 contains all the outliers (aka. anomalies).
COMP90073 ý University of Melbourne S2, 2021

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts