School of Computing and Information Systems (CIS) The University of Melbourne COMP90073
Security Analytics
Tutorial exercises: Week 6
1. Howthefollowingmeasuresguidesusinanomalydetectionproblems?Givea scenario where each can be used.
Copyright By PowCoder代写 加微信 powcoder
a. Precision b. Recall
c. F-score d. AUC
2. Following are the results observed for clustering 6000 data points into 3 clusters: A, B and C:
What is the F1-Score with respect to cluster B?
3. ConsidertheK-meansschemeforoutlierdetectiondescribedinandthebelow figure.
a. The points at the bottom of the compact cluster shown in the above figure have a somewhat higher outlier score than those points at the top of the compact cluster. Why?
b. Supposethatwechoosethenumberofclusterstobemuchlarger,e.g.,10. Would the proposed technique still be effective in finding the most extreme outlier at the top of the figure? Why or why not?
c. The use of relative distance adjusts for differences in density. Give an example of where such an approach might lead to the wrong conclusion.
4. Iftheprobabilitythatanormalobjectisclassifiedasananomalyis0.01andthe probability that an anomalous object is classified as anomalous is 0.99, then what is the false alarm rate and detection rate if 99% of the objects are normal? (Use the definitions given below.)
o Detection rate = number of anomalies detected/total number of anomalies
o False alarm rate = number of false anomalies/number of objects classified as anomalies
5. When a comprehensive training set is available, a supervised anomaly detection technique can typically outperform an unsupervised anomaly technique when performance is evaluated using measures such as the detection and false alarm rate. However, in some cases, such as fraud detection, new types of anomalies are always developing. Performance can be evaluated according to the detection and false alarm rates, because it is usually possible to determine, upon investigation, whether an object (transaction) is anomalous. Discuss the relative merits of supervised and unsupervised anomaly detection under such conditions.
6. Distinguishbetweennoiseandoutliers.Besuretoconsiderthefollowing questions.
a. Isnoiseeverinterestingordesirable?Anomalies?
b. Cannoiseobjectsbeoutliers?
c. Are noise objects always outliers?
d. Areoutliersalwaysnoiseobjects?
e. Cannoisemakeatypicalvalueintoanunusualone,orviceversa?
7. AssumeyourunDBSCANwithMinPoints=6andepsilon=0.1foradatasetand obtain 4 clusters and 5% of the objects in the dataset are classified as outliers. Now you run DBSCAN with MinPoints=8 and epsilon=0.1. How do you expect the clustering results to change?
8. IfEpsilonis2andminpointis2,whataretheclustersthatDBScanwould discover with the following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
The distance matrix based on the Euclidean distance is given below:
Draw the 10 by 10 space and illustrate the discovered clusters. What if Epsilon is increased to 10?
9. YoumayusePythonorWekaforthefollowingexercises
Download the Ionosphere data set from the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/ionosphere
a. UseoftheLOFmethodanddeterminetherankingoftheanomalies
b. Rankthedatapointsbasedontheirk-nearestneighbourscores,for
values of k ranging from 1 through 5.
c. Normalize the data, so that the variance along each dimension is 1.
Rank the data points based on their k-nearest neighbour scores, for
values of k ranging from 1 through 5.
d. Howmanydatapointsarecommonamongthetop5rankedanomalies
using different methods?
10.Repeat the above exercise with the network intrusion data set from the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com