tut1.dvi
COMP9318 Tutorial 1
Wei WANG
The University of New South Wales
weiw@cse.unsw.edu.au
➀ Data Warehouse and OLAP
Produced with LATEX seminar style & PSTricks 1
Q1
➀ Create a star schema diagram that will enable FIT-WORLD GYM
INC. to analyze their revenue.
➜ The fact table will include — for every instance of revenue taken —
attribute(s) useful for analyzing revenue.
➜ The star schema will include all dimensions that can be useful for
analyzing revenue
➜ The only two data sources are shown below
➁ Appreciate the ETL process involved populating the data warehouse.
➂ Appreciate the difference of formulating queries: “Find the percentage
of revenue generated by members in the last year”.
➃ How many cuboids are there in the complete data cube?
Q1 2
ER DIAGRAM
ONEDAYPASS
PassId PassName
MEMBER
MembId
MembName
MembZip
MEMBERSHIP
MshipID
MshpName
MshpPrice
PASSCATEGORY
PassCatId
CatName
Price
SALETRANSACT
STrid Date
MERCHANDISE
MrchId
MrchName
MrchPrice
Buys Pays
Date
IsOf BuysVia
SoldVia
Quantity
ER DIAGRAM 3
DATA INSTANCES
MEMBER
Membid MembName MembZip MshpID MsDatePayed
111 Joe 60611 M1 1-Jan-04
222 Mary 60640 M3 1-Jan-04
333 Sue 60611 M3 1-Jan-04
MEMBERSHIP
MshpID MshpName MshpPrice
M1 Platinum $1,000
M2 Gold $800
M3 Value $300
ONEDAYPASS
PassID PassDate PassCatID Membid
1-001 1-Jan-04 PSA 111
1-002 1-Jan-04 PSA 333
1-003 2-Jan-04 PSK 333
MERCHANDISE
MrchID MrchName MrchPrice
AP1 T-shirt $11
AP2 Hat $9
EQ1 Jump Rope $12
PASSCATEGORY
PassCatId CatName Price
PSA Adult $20
PSS Senior $10
PSK Kid $3
SOLDVIA
STrid MrchID Quatity
11111 AP1 1
11112 AP2 1
11112 AP2 1
11113 EQ1 3
Note: MEMEBERS can bring in
non-member guests. For each non-
member guest, a member buys
a one-day-guest-pass of a certain
pass category.
SALESTRANSACT
STrid Date Membid
11111 1-Jan-04 333
11112 2-Jan-04 222
11113 3-Jan-04 111
DATA INSTANCES 4
ANOTHER DATA SOURCE
SPECIALEVENT
CorpCustID CorpCustNameLoc EventTypeCode EvetType EventDate AmountCharged
CC1 Sears, Chicago 60640 L-A All Day Rental, January 4, 2004 $3500
CC2 Boeing, Chicago 60611 L-H Half Day Rental, January 5, 2004 $2200
ANOTHER DATA SOURCE 5
SOLUTION: STAR SCHEMA
CalendarKey
Day
Month
Quarter
Year
Calender
ItemKey
ItemId
Category
Type
Item
CalendarKey
CustomerKey
ItemKey
Amount
Quantity
Revenue
CustomerKey
CustomerId
CustName
CustCategory
Zip
Customer
SOLUTION: STAR SCHEMA 6
POPULATED TABLES
CALENDAR
CalendarKey Day Month Quarter Y ear
1 1 Jan 1 2004
2 2 Jan 1 2004
3 3 Jan 1 2004
4 4 Jan 1 2004
5 5 Jan 1 2004
ITEM
ItemKey ItemId Category Type
1 M1 Memship Platinum
2 M2 Memship Gold
3 M3 Memship Value
4 PSA OneDayP. Adult
5 PSS OneDayP. Senior
6 PSK OneDayP. Kid
7 AP1 Mrch. T-Shirt
8 AP2 Mrch. Hat
9 EQ1 Mrch. Jump Rope
10 L-A Spec. Evnt All Day
11 L-H Spec. Evnt Half Day
REVENUE
CalendarKey CustKey ItemKey Amount Quantity
1 1 1 $1000 1
1 2 3 $300 1
1 3 3 $300 1
1 1 4 $20 1
1 3 4 $20 1
2 3 6 $3 1
1 3 7 $11 1
1 3 8 $9 1
2 2 8 $9 1
3 1 9 $36 3
4 4 10 $3500 1
5 5 11 $2200 1
CUSTOMER
CustKey CustId CustName CustCategory Zip
1 111 Joe Ind 60611
2 222 Mary Ind 60640
3 333 Sue Ind 60611
4 CC1 Sears Corp 60640
5 CC2 Boeing Corp 60611
POPULATED TABLES 7
➀ See above. Note that this is not the unique answer.
➁ There are several tasks involved when importing the data into the data
warehouse. E.g., we need to extract zipcode information from
CorpCustNameLoc; we need to perform aggregation
(price ·Quantity) for tuples in the merchandise table; we might also
need to deal with (near) duplicate object detection (e.g., the same
“member” that appear in two data sources).
➂ “Find the percentage of revenue generated by members in the last
year” can be easily answered on the star schema by two aggregate
queries on the fact table. Specifically, if the complete data cube has
been built, the queries can be efficiently answered by the cuboid
(Y ear), and the cuboid (Y ear, Category).
➃ Since CustName is not likely to be a good ”level” for analysis (rather,
it is a descriptive attribute), there are 4 levels on Calendar dimension,
3 on Item, and 3 on Customer. Therefore, there are
(4 + 1) ∗ (3 + 1) ∗ (3 + 1) = 80 in total.
Note that we could have different hierarchies on a dimension. E.g., we
8
could consider the hierarchy on the Customer dimension one of the
following. They have different semantics, but do not affect the number
of cuboids.
ALL
Category
Zipcode
Customer
ALL
Category Zipcode
Customer
9
Q2
Consider the star schema below
➜ Write an MDX query that display total DollasSold for each product
category and each store in the State ’CA’.
➜ Write an MDX query that display total DollasSold for each product
category and each store in the State ’CA’ in 2007.
➜ create a star schema that has Month and Region as the finest
granularity on the corresponding dimensions. Show all tables in the
new data model populated with the data based on the data from the
original model.
Q2 10
CalendarKey
Day
Month
Quarter
Year
Calender
StoreKey
StoreId
State
Region
Store
CalendarKey
StoreKey
ProductKey
CustomerKey
DollarsSold
UnitsSold
Sales
ProductKey
Name
Type
Category
Product
CustomerKey
CustomerId
Gender
Customer
Q2 11
POPULATED TABLES
CALENDAR
CalendarKey Day Month Quarter Y ear
1 1 Jan 1 2003
2 2 Jan 1 2003
3 1 Feb 1 2003
PRODUCT
ProKey ProName ProType Category
1 Luvs 50 Diapers Infant Care
2 Huggies 24 Diapers Infant Care
3 High C Vitamin Dietary Supp
STORE
StoreKey StoreID State Region
1 X1 Maine East
2 X2 New Jersey East
3 Y1 Ohio Midwest
CUSTOMER
CustKey CustID Gender
1 12 Male
2 23 Male
3 34 Female
SALES
CalendarKey ProKey StoreKey CustKey $Sold UnitsSold
1 1 1 1 $15 1
1 2 2 2 $20 1
1 2 1 1 $40 2
1 2 2 1 $20 1
2 2 1 1 $19 1
2 2 2 1 $19 1
3 3 3 1 $9 2
3 3 3 2 $9 1
3 3 3 3 $9 1
POPULATED TABLES 12
SOLUTION: MDX QUERY
SELECT [Product ].[ Category ]. MEMBERS ON COLUMNS
[Store ].[USA ].[CA]. CHILDREN ON ROWS
FROM [Sales]
WHERE ([ Measures ].[ DollasSold ])
SELECT [Product ].[ Category ]. MEMBERS ON COLUMNS
[Store ].[USA ].[CA]. CHILDREN ON ROWS
FROM [Sales]
WHERE ([Time ].[2007] , [Measures ].[ DollasSold ])
SOLUTION: MDX QUERY 13
SOLUTION: STAR SCHEMA
CalendarKey
Month
Quarter
Year
Calender
ItemKey
Region
Store
CalendarKey
StoreKey
ProductKey
CustomerKey
DollarsSold
UnitsSold
Sales
ProductKey
Name
Type
Category
Product
CustomerKey
CustomerId
Gender
Customer
SOLUTION: STAR SCHEMA 14
POPULATED TABLES
CALENDAR
CalendarKey Month Quarter Y ear
1 Jan 1 2003
2 Feb 1 2003
PRODUCT
ProKey ProName ProType Category
1 Luvs 50 Diapers Infant Care
2 Huggies 24 Diapers Infant Care
3 High C Vitamin Dietary Supp
STORE
StoreKey Region
1 East
2 Midwest
CUSTOMER
CustKey CustID Gender
1 12 Male
2 23 Male
3 34 Female
SALES (← Intermediate)
CalendarKey ProKey StoreKey CustKey $Sold UnitsSold
1 1 1 1 $15 1
1 2 1 2 $20 1
1 2 1 1 $40 2
1 2 1 1 $20 1
1 2 1 1 $19 1
1 2 1 1 $19 1
2 3 2 1 $9 2
2 3 2 2 $9 1
2 3 2 3 $9 1
POPULATED TABLES 15
POPULATED TABLES
SALES
CalendarKey ProKey StoreKey CustKey $Sold UnitsSold
1 1 1 1 $15 1
1 2 1 2 $20 1
1 2 1 1 $98 5
2 3 2 1 $9 2
2 3 2 2 $9 1
2 3 2 3 $9 1
POPULATED TABLES 16