School of Computing and Information Systems The University of Melbourne
COMP90049 Introduction to Machine Learning (Semester 1, 2022) Week 12
1. Given the following univariate dataset, calculate a statistical model based on the assumption that your data is coming from a normal distribution. Determine whether the instance x=1.2 is anomalous or not if we use the boxplot test?
X = {2, 2.5, 2.6, 3, 3.1, 3.2, 3.4, 3.7, 4, 4.1,4.8}
Copyright By PowCoder代写 加微信 powcoder
2. Giventhefollowingunivariatedataset,determinetheoutlierscoreforinstances(x=0.5)and(x=4) using the Inverse Relative density using 2-NN (Manhattan distance) strategy.
Dataset = {1,1.05, 1.1, 1.15, 1.2, 1.21, 1.3, 1.4, 1.45, 1.5, 4.55, 5.6, 6.8, 7.58, 8.6, 9.7, 10.3, 11.4, 12.3,13.5}
3. We have dataset containing demographic and income data from United States in 1994. We want to judge the fairness of a classifier we have trained on it. The data set consists of about 48,000 individuals, where each instance X is characterized through a range of 14 demographic attributes (gender, origin, education, race, occupation, etc.). The target variable Y is the income of the person (>50K or <=50K). Assume we selected gender as our protected attribute. We trained our classifier and observed the following outcomes. The label y=1 means “income >50K”, and y=0 means “income <=50K.
P(ŷ=1|A=f) P(ŷ=1|A=m) P(ŷ=1|Y=1, A=f) P(ŷ=1|Y=1, A=m) P(Y=1|ŷ=1, A=f) P(Y=1|ŷ=1, A=m) P(Y=1|ŷ=1) P(ŷ=1| Y=1) 0.81 0.75 0.80 0.86 0.73 0.74 0.74 0.85
(i). (ii).
Name each of the statistics and provide a formula for its measurement. Be sure you understand the intuition / connection behind the statistical notion and its metric.
For each of the following criteria, decide whether the classifier meets this criterion.
a) Group Fairness (Demographic parity)
b) Equal opportunity
c) Predictive parity
4. AcommonmetricforassessingclassifierfairnessistheGAPinscoresachievedacrossgroups.If we choose true positive rate (TPR) as our score of interest, we will check the classifier for “equal opportunity”. If we choose positive predictive value as score of interest, we test our classifier for “predictive parity”. Verify your observations in question 2 using (a) max-GAP and (b) avg-GAP. When would avg-GAP be preferred, and when max-GAP?
5. For our classifier above, we reported that 𝑇𝑃𝑅𝑓=0.8, 𝑇𝑃𝑅𝑚=0.86 and TPR=0.85 (cf. Columns 3, 4 and 8 in the table). How do you think TPR was computed, and what does it tell us about the data?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com