COMP9318 (18S1) ASSIGNMENT 1
DUE ON 23:59 23 MAY, 2018 (WED)
Q1. (40 marks)
Consider the following base cuboid Sales with four tuples and the aggregate function
SUM:
Location, Time, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.
- (1) List the tuples in the complete data cube of R in a tabular form with 4 attributes, i.e., Location, T ime, Item, SUM(Quantity)?
- (2) Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.
- (3) Consider the following ice-berg cube query:
SELECT Location, Time, Item, SUM(Quantity)
FROM Sales CUBE BY Location, Time, Item HAVING COUNT(*) > 1
Draw the result of the query in a tabular form.
(4) Assume that we adopt a MOLAP architecture to store the full data cube of R, with
the following mapping functions:
1 if x = ‘Sydney’,
2 if x = ‘Melbourne’, 0 ifx=ALL.
1 if x = 2005,
Location |
T ime |
I tem |
Quantity |
Sydney Sydney Sydney Melbourne |
2005 2006 2006 2005 |
PS2 PS2 Wii XBox 360 |
1400 1500 500 1700 |
fLocation(x) =
fTime(x) = 2 if x = 2006,
0 ifx=ALL. 1
2
DUE ON 23:59 23 MAY, 2018 (WED)
1 if x = ‘PS2’,
2 if x = ‘XBox 360’, 3 if x = ‘Wii’,
0 ifx=ALL.
Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex,Value). You also need to write down the function you chose to map a multi-dimensional point to a one-dimensioinal point.
Q2. (30 marks)
fItem(x) =
Consider binary classification where the class attribute y takes two values: 0 or 1. Let the feature vector for a test instance be a d-dimension column vector ⃗x. A linear classifier with the model parameter w (which is a d-dimension column vector) is the following function:
- Prove that if the feature vectors are d-dimension, then a Na ̈ıve Bayes classifier is a linear classifier in a d + 1-dimension space. You need to explicitly write out the vector w that the Na ̈ıve Bayes classifier learns.
- It is obvious that the Logistic Regression classifier learned on the same training dataset as the Na ̈ıve Bayes is also a linear classifier in the same d + 1-dimension space. Let the parameter w learned by the two classifiers be wLR and wNB, re- spectively. Briefly explain why learning wNB is much easier than learning wLR.Q3. (30 marks)
Consider a dataset consisting of n training data xi and the corresponding class labelyi ∈ { 0, 1 }.
(1) Consider the standard logistic regression model:P[y = 1 | x] = σ(w⊤x)
where σ is the sigmoid function.
The learning of the model parameter is to find w∗ that minimizes some functionof w, commonly known as the loss function.
1 ,ifw⊤x>0 0 , otherwise.
y=
We make additional simplifying assumptions: x is a binary vector (i.e., each dimension
of x take only two values: 0 or 1).
Hint 1. logi xi = i logxi
COMP9318 (18S1) ASSIGNMENT 1 3
Prove that the loss function for logistic regression is:
n
l(w) = −yiw⊤xi + ln(1 + exp(w⊤xi))
i=1
(2) Consider a variant of the logistic regression model:
P[y = 1 | x] = f(w⊤x)
where f : R → [0,1] is a squashing function that maps a real value to a value between 0 and 1.
Write out its loss function.
Submission
Please write down your answers in a file named ass1.pdf. You must write down your name and student ID on the first page.
You can submit your file by
give cs9318 ass1 ass1.pdf
Late Penalty. -10% per day for the first two days, and -20% for each of the following days.