COMP9318 (21T1) ASSIGNMENT 1
DUE ON 20:59 16 APR, 2021 (FRI)
Q1. (40 marks)
Consider the following base cuboid Sales with four tuples and the aggregate function
SUM:
Location, Time, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.
(1) List the tuples in the complete data cube of R in a tabular form with 4 attributes, i.e., Location, T ime, Item, SUM(Quantity)?
(2) Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY clause.
(3) Consider the following ice-berg cube query:
SELECT Location, Time, Item, SUM(Quantity)
FROM Sales
CUBE BY Location, Time, Item
HAVING COUNT(*) > 1
Draw the result of the query in a tabular form.
(4) Assume that we adopt a MOLAP architecture to store the full data cube of R, with
the following mapping functions:
1 if x = ‘Sydney’,
2 if x = ‘Melbourne’, 0 ifx=ALL.
1 if x = 2005,
Location
T ime
I tem
Quantity
Sydney Sydney Sydney Melbourne
2005
2006
2006
2005
PS2 PS2 Wii XBox 360
1400
1500
500
1700
fLocation(x) =
fTime(x) = 2 if x = 2006,
0 ifx=ALL. 1
2
DUE ON 20:59 16 APR, 2021 (FRI)
1 if x = ‘PS2’,
2 if x = ‘XBox 360’, 3 if x = ‘Wii’,
0 ifx=ALL.
If we want to draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex,Value), then which of the following function is feasible? Why? You also need to draw the MOLAP cube.
• f(x)=9·fLocation(x)+3·fTime(x)+fItem(x) • f(x)=16·fLocation(x)+4·fTime(x)+fItem(x)
Q2. (30 marks)
Consider the following training examples which are used to construct a decision tree to
fItem(x) =
help predict whether a patient is likely to have a lung cancer.
(1) Use Gini index to construct a decision tree that predicts whether a patient is likely to have a lung cancer. You need to show every step of the construction.
(2) Translate your decision tree into decision rules.
Q3. (30 marks)
Consider binary classification where the class attribute y takes two values: 0 or 1. Let the feature vector for a test instance to be a d-dimention column vector x. A linear classifier with the model paramter w (which is a d-dimension column vector) is the following function:
1 ,ifwTx>0 y = 0 , otherwise.
We make additional simplifying assumptions: x is a binary vector (i.e., each dimension of x take only two values: 0 or 1).
(1) Prove that if the feature vectors are d-dimension, then a Na ̈ıve Bayes classifier is a linear classifier in a d + 1-dimension space. You need to explicitly write out the vector w that the Na ̈ıve Bayes classifier learns.
Patient ID
Gender
Smokes?
Chest pain?
Cough?
Lung Cancer
1
Female
Yes
Yes
Yes
Yes
2
Male
Yes
No
Yes
Yes
3
Male
No
No
No
Yes
4
Female
No
Yes
Yes
No
5
Male
Yes
Yes
No
Yes
6
Male
No
Yes
Yes
No
COMP9318 (21T1) ASSIGNMENT 1 3
(2) It is obvious that the Logistic Regression classifier learned on the same training dataset as the Na ̈ıve Bayes is also a linear classifier in the same d + 1-dimension space. Let the parameter w learned by the two classifiers be wLR and wNB, respectively. Briefly explain why learning wNB is much easier than learning wLR.
Submission
Please write down your answers in a file named ass1.pdf. You must write down your name and student ID on the first page.
You can submit your file by
give cs9318 ass1 ass1.pdf
Late Penalty. 0 mark if not submit on time (i.e., firm deadline).
Hint 1. logi xi = i logxi