solution
Location Time Item SUM(Quantity)
ALL ALL ALL 5100
ALL ALL PS2 2900
ALL ALL XBox 360 1700
ALL ALL Wii 500
ALL 2005 ALL 3100
ALL 2005 PS2 1400
ALL 2005 XBox 360 1700
ALL 2005 Wii 0
ALL 2006 ALL 2000
ALL 2006 PS2 1500
ALL 2006 XBox 360 0
ALL 2006 Wii 500
Sydney ALL ALL 3400
Sydney ALL PS2 2900
Sydney ALL XBox 360 0
Sydney ALL Wii 500
Sydney 2005 ALL 1400
Sydney 2005 PS2 1400
Name: Yufei Xie
Student ID: z5134233
ASSIGNMENT 1
Q1
1.1
Sydney 2005 XBox 360 0
Sydney 2005 Wii 0
Sydney 2006 ALL 2000
Sydney 2006 PS2 1500
Sydney 2006 XBox 360 0
Sydney 2006 Wii 500
Melbourne ALL ALL 1700
Melbourne ALL PS2 0
Melbourne ALL XBox 360 1700
Melbourne ALL Wii 0
Melbourne 2005 ALL 1700
Melbourne 2005 PS2 0
Melbourne 2005 XBox 360 1700
Melbourne 2005 Wii 0
Melbourne 2006 ALL 0
Melbourne 2006 PS2 0
Melbourne 2006 XBox 360 0
Melbourne 2006 Wii 0
1.2
1.3
SELECT Location, Time, Item, SUM(Quantity)
FROM Sales
GROUP BY Location, Time, Item
Location Time Item SUM(Quantity)
ALL ALL ALL 5100
ALL ALL PS2 2900
ALL ALL XBox 360 1700
ALL ALL Wii 500
ALL 2005 ALL 3100
ALL 2005 PS2 1400
ALL 2005 XBox 360 1700
ALL 2006 ALL 2000
ALL 2006 PS2 1500
ALL 2006 Wii 500
Sydney ALL ALL 3400
Sydney ALL PS2 2900
Sydney ALL Wii 500
Sydney 2005 ALL 1400
Sydney 2005 PS2 1400
Sydney 2006 ALL 2000
Sydney 2006 PS2 1500
Sydney 2006 Wii 500
Melbourne ALL ALL 1700
Melbourne ALL XBox 360 1700
Melbourne 2005 ALL 1700
Melbourne 2005 XBox 360 1700
1.4
the function I chose to map a multi-dimensional point to a one-dimensioinal point
ArrayIndex Value
0 5100
1 2900
2 1700
3 500
4 3100
5 1400
6 1700
8 2000
9 1500
11 500
12 3400
13 2900
15 500
16 1400
17 1400
20 2000
21 1500
23 500
24 1700
26 1700
28 1700
30 1700
Q2
2.1
We classfiy as 1 when
Thus, let and let for . Thus the Naive Bayes
classifier is a linear classifier in a d + 1-dimension space.
2.2
To get , we need to compute the class prior probability and probability of conditioned
on class. Through maximum likelyhood estimation, these can be computed just by computing
the ratio of counts.
For , we need to use gradient descent algorithm to iterate optimize the parameters .
Thus is much easier than learning
Q3
3.1
The likelihood of is
The likelihood of training dataset is
We can use the negative log-likelihood as the loss function. Thus
3.2
ASSIGNMENT 1
Q1
1.1
1.2
1.3
1.4
Q2
2.1
2.2
Q3
3.1
3.2