数据挖掘机器学习代写: COMP9318 ASSIGNMENT 1

COMP9318 (18S1) ASSIGNMENT 1

DUE ON 23:59 23 MAY, 2018 (WED)

Q1. (40 marks)
Consider the following base cuboid Sales with four tuples and the aggregate function

SUM:

Location, Time, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.

  1. (1)  List the tuples in the complete data cube of R in a tabular form with 4 attributes, i.e., Location, T ime, Item, SUM(Quantity)?
  2. (2)  Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.
  3. (3)  Consider the following ice-berg cube query:
              SELECT  Location, Time, Item, SUM(Quantity)
    
          FROM    Sales
          CUBE BY Location, Time, Item
          HAVING  COUNT(*) > 1

Draw the result of the query in a tabular form.
(4) Assume that we adopt a MOLAP architecture to store the full data cube of R, with

the following mapping functions:
1 if x = ‘Sydney’,

2 if x = ‘Melbourne’, 0 ifx=ALL.

1 if x = 2005, 

Location

T ime

I tem

Quantity

Sydney Sydney Sydney Melbourne

2005
2006
2006
2005

PS2 PS2 Wii XBox 360

1400
1500
 500
1700

fLocation(x) =

fTime(x) = 2 if x = 2006,

0 ifx=ALL. 1

2

DUE ON 23:59 23 MAY, 2018 (WED)

1 if x = ‘PS2’,
2 if x = ‘XBox 360’, 3 if x = ‘Wii’,
0 ifx=ALL.

Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex,Value). You also need to write down the function you chose to map a multi-dimensional point to a one-dimensioinal point.

Q2. (30 marks)

fItem(x) =

Consider binary classification where the class attribute y takes two values: 0 or 1. Let the feature vector for a test instance be a d-dimension column vector ⃗x. A linear classifier with the model parameter w (which is a d-dimension column vector) is the following function:

  • Prove that if the feature vectors are d-dimension, then a Na ̈ıve Bayes classifier is a linear classifier in a d + 1-dimension space. You need to explicitly write out the vector w that the Na ̈ıve Bayes classifier learns.
  • It is obvious that the Logistic Regression classifier learned on the same training dataset as the Na ̈ıve Bayes is also a linear classifier in the same d + 1-dimension space. Let the parameter w learned by the two classifiers be wLR and wNB, re- spectively. Briefly explain why learning wNB is much easier than learning wLR.Q3. (30 marks)
    Consider a dataset consisting of n training data xi and the corresponding class label

    yi ∈ { 0, 1 }.
    (1) Consider the standard logistic regression model:

    P[y = 1 | x] = σ(w⊤x)

    where σ is the sigmoid function.
    The learning of the model parameter is to find w∗ that minimizes some function

    of w, commonly known as the loss function.

􏰃1 ,ifw⊤x>0 0 , otherwise.

y=
We make additional simplifying assumptions: x is a binary vector (i.e., each dimension

of x take only two values: 0 or 1).

Hint 1. log􏰅i xi = 􏰄i logxi

COMP9318 (18S1) ASSIGNMENT 1 3

Prove that the loss function for logistic regression is:

n
l(w) = 􏰆 􏰁−yiw⊤xi + ln(1 + exp(w⊤xi))􏰂

i=1
(2) Consider a variant of the logistic regression model:

P[y = 1 | x] = f(w⊤x)

where f : R → [0,1] is a squashing function that maps a real value to a value between 0 and 1.

Write out its loss function.

Submission

Please write down your answers in a file named ass1.pdf. You must write down your name and student ID on the first page.

You can submit your file by

  give cs9318 ass1 ass1.pdf

Late Penalty. -10% per day for the first two days, and -20% for each of the following days.