OneClassSVM
The OneClassSVM algorithm uses the scikit-learn OneClassSVM to fit a model from a set of features or fields for detecting anomalies and outliers, where features are expected to contain numerical values. OneClassSVM is an unsupervised outlier detection method.
For further information, see the sci-kit learn documentation:
http://scikit-learn.org/stable/modules/svm.html#kernel-functions
Copyright By PowCoder代写 加微信 powcoder
Parameters
• The kernel parameter specifies the kernel type for using in the algorithm, where the default value is kernel is rbf.
o Kernel types include: linear, rbf, poly, and sigmoid.
• You can specify the upper bound on the fraction of training error as well as the lower
bound of the fraction of support vectors using the nu parameter, where the default
value is 0.5.
• The degree parameter is ignored by all kernels except the polynomial kernel, where the
default value is 3.
• gamma is the kernel co-efficient that specifies how much influence a single data instance
has, where the default value is 1/ number of features.
• The independent term of coef0 in the kernel function is only significant if you have
polynomial or sigmoid function.
• The term tol is the tolerance for stopping criteria.
• The shrinking parameter determines whether to use the shrinking heuristic.
• You can save OneClassSVM models using the into keyword.
• You can apply the saved model later to new data with the apply command.
Syntax constraints
• After running the fit or apply command, a new field named is generated. This field defines whether a particular record (row) is normal ( ) or anomalous (isNormal=-1).
• You cannot inspect the model learned by OneClassSVM with the summary command. Example
The following example uses OneClassSVM on a test set.
fit OneClassSVM
[kernel=
[gamma=
… | fit OneClassSVM * kernel=”poly” nu=0.5 coef0=0.5 gamma=0.5 tol=1
degree=3 shrinking=f into
COMP90073 © University of Melbourne S2, 2020
isNormal=1
TESTMODEL_OneClassSVM
Advanced settings (keep default unless you have special requirements) App: Splunk Machine Learning Toolkit -> Settings -> OneClassSVM
Where to put my data (CSV) files?
>> $SPLUNK_HOME\etc\apps\Splunk_ML_Toolkit\lookups
Example: Bitcoin Transaction
First, let’s inspect the
| inputlookup bitcoin_transactions.csv
COMP90073 © University of Melbourne S2, 2020
In this dataset, we have the following 5 fields (aka. features):
date – Date and time of the transaction, not interested in this anomaly detection task transaction_id – nominal variable, not interested in this anomaly detection task user_id_from – nominal variable, interested in this anomaly detection task user_id_to – nominal variable, interested in this anomaly detection task
value – continuous variable, interested in this anomaly detection task
For all nominal variables, we need to pre-process that so that Splunk will recognize them as
| inputlookup bitcoin_transactions.csv
| eval from_user = “u”.user_id_from | eval to_user = “u”.user_id_to
Now, we have the field from_user and to_user which have “u” as the prefix. Thus, Splunk will treat them as
Next, we can fit the OneClassSVM algorithm using the following commands:
| inputlookup bitcoin_transactions.csv
| eval from_user = “u”.user_id_from | eval to_user = “u”.user_id_to
Where the hyper-parameters are set as: kernel=”rbf” gamma=0.33 nu=0.02
After the model is trained, Splunk will show you 10,000 lines of data which is evaluated by the
trained model, which means the displayed data is not the full dataset.
Therefore, to get the full processed dataset, we need to apply our trained model using:
| fit OneClassSVM “from_user” “to_user” “value” kernel=”rbf” gamma=0.33
nu=0.02 shrinking=True into “OneSVM_Bitcoin”
COMP90073 © University of Melbourne S2, 2020
| inputlookup bitcoin_transactions.csv | eval from_user = “u”.user_id_from |
eval to_user = “u”.user_id_to
| apply “OneSVM_Bitcoin” | outputlookup bitcoin_svm.csv
Optionally, we can save the processed full dataset as
>> $SPLUNK_HOME\etc\apps\Splunk_ML_Toolkit\lookups
Now, we can use the following command to load the processed data, and see some statistics of the anomaly detection:
| inputlookup bitcoin_svm.csv | chart count by isNormal
The result shows 61749 transactions are abnormal (isNormal=-1) and 38251 transactions are
normal (isNormal=1), which seems to be too sensitive.
Now, we retrain our model using different parameters: kernel=”rbf” gamma=0.33 nu=0.001 The full commands will be as below:
| inputlookup bitcoin_transactions.csv
| eval from_user = “u”.user_id_from | eval to_user = “u”.user_id_to
| apply “OneSVM_Bitcoin_2” | outputlookup bitcoin_svm_2.csv
| inputlookup bitcoin_svm_2.csv | chart count by isNormal
The result shows 2488 transactions are abnormal (isNormal=-1) and 97512 transactions are
normal (isNormal=1), which seems to be acceptable.
| fit OneClassSVM “from_user” “to_user” “value” kernel=”rbf” gamma=0.33
nu=0.001 shrinking=True into “OneSVM_Bitcoin_2”
| inputlookup bitcoin_transactions.csv | eval from_user = “u”.user_id_from |
eval to_user = “u”.user_id_to
COMP90073 © University of Melbourne S2, 2020
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com