COMP9318 (18S1) ASSIGNMENT 1
DUE ON 23:59 23 MAY, 2018 (WED)
Q1. (40 marks)
Consider the following base cuboid Sales with four tuples and the aggregate function
SUM:
Location T ime Item Quantity
Sydney 2005 PS2 1400
Sydney 2006 PS2 1500
Sydney 2006 Wii 500
Melbourne 2005 XBox 360 1700
Location, Time, and Item are dimensions and Quantity is the measure. Suppose the
system has built-in support for the value ALL.
(1) List the tuples in the complete data cube of R in a tabular form with 4 attributes,
i.e., Location, T ime, Item,SUM(Quantity)?
(2) Write down an equivalent SQL statement that computes the same result (i.e., the
cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.
(3) Consider the following ice-berg cube query:
SELECT Location, Time, Item, SUM(Quantity)
FROM Sales
CUBE BY Location, Time, Item
HAVING COUNT(*) > 1
Draw the result of the query in a tabular form.
(4) Assume that we adopt a MOLAP architecture to store the full data cube of R, with
the following mapping functions:
fLocation(x) =
1 if x = ‘Sydney’,
2 if x = ‘Melbourne’,
0 if x = ALL.
fT ime(x) =
1 if x = 2005,
2 if x = 2006,
0 if x = ALL.
1
2 DUE ON 23:59 23 MAY, 2018 (WED)
fItem(x) =
1 if x = ‘PS2’,
2 if x = ‘XBox 360’,
3 if x = ‘Wii’,
0 if x = ALL.
Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form
of (ArrayIndex, V alue). You also need to write down the function you chose to
map a multi-dimensional point to a one-dimensioinal point.
Q2. (30 marks)
Consider binary classification where the class attribute y takes two values: 0 or 1. Let the
feature vector for a test instance be a d-dimension column vector ~x. A linear classifier with
the model parameter w (which is a d-dimension column vector) is the following function:
y =
{
1 , if w>x > 0
0 , otherwise.
We make additional simplifying assumptions: x is a binary vector (i.e., each dimension
of x take only two values: 0 or 1).
• Prove that if the feature vectors are d-dimension, then a Näıve Bayes classifier is
a linear classifier in a d + 1-dimension space. You need to explicitly write out the
vector w that the Näıve Bayes classifier learns.
• It is obvious that the Logistic Regression classifier learned on the same training
dataset as the Näıve Bayes is also a linear classifier in the same d + 1-dimension
space. Let the parameter w learned by the two classifiers be wLR and wNB, re-
spectively. Briefly explain why learning wNB is much easier than learning wLR.
Hint1.log∏ixi=∑ilogxi
Q3. (30 marks)
Consider a dataset consisting of n training data xi and the corresponding class label
yi ∈ { 0, 1 }.
(1) Consider the standard logistic regression model:
P [y = 1 | x] = σ(w>x)
where σ is the sigmoid function.
The learning of the model parameter is to find w∗ that minimizes some function
of w, commonly known as the loss function.
COMP9318 (18S1) ASSIGNMENT 1 3
Prove that the loss function for logistic regression is:
`(w) =
n∑
i=1
(
−yiw>xi + ln(1 + exp(w>xi))
)
(2) Consider a variant of the logistic regression model:
P [y = 1 | x] = f(w>x)
where f : < → [0, 1] is a squashing function that maps a real value to a value
between 0 and 1.
Write out its loss function.
Submission
Please write down your answers in a file named ass1.pdf. You must write down
your name and student ID on the first page.
You can submit your file by
give cs9318 ass1 ass1.pdf
Late Penalty. -10% per day for the first two days, and -20% for each of the following
days.