程序代写代做代考 ER tut1.dvi

tut1.dvi

COMP9318 Tutorial 1

Wei WANG

The University of New South Wales

weiw@cse.unsw.edu.au

➀ Data Warehouse and OLAP

Produced with LATEX seminar style & PSTricks 1

Q1

➀ Create a star schema diagram that will enable FIT-WORLD GYM

INC. to analyze their revenue.

➜ The fact table will include — for every instance of revenue taken —

attribute(s) useful for analyzing revenue.

➜ The star schema will include all dimensions that can be useful for

analyzing revenue

➜ The only two data sources are shown below

➁ Appreciate the ETL process involved populating the data warehouse.

➂ Appreciate the difference of formulating queries: “Find the percentage

of revenue generated by members in the last year”.

➃ How many cuboids are there in the complete data cube?

Q1 2

ER DIAGRAM

ONEDAYPASS

PassId PassName

MEMBER

MembId

MembName

MembZip

MEMBERSHIP

MshipID

MshpName

MshpPrice

PASSCATEGORY

PassCatId

CatName

Price

SALETRANSACT

STrid Date

MERCHANDISE

MrchId

MrchName

MrchPrice

Buys Pays

Date

IsOf BuysVia

SoldVia

Quantity

ER DIAGRAM 3

DATA INSTANCES
MEMBER

Membid MembName MembZip MshpID MsDatePayed

111 Joe 60611 M1 1-Jan-04
222 Mary 60640 M3 1-Jan-04
333 Sue 60611 M3 1-Jan-04

MEMBERSHIP

MshpID MshpName MshpPrice

M1 Platinum $1,000
M2 Gold $800
M3 Value $300

ONEDAYPASS

PassID PassDate PassCatID Membid

1-001 1-Jan-04 PSA 111
1-002 1-Jan-04 PSA 333
1-003 2-Jan-04 PSK 333

MERCHANDISE

MrchID MrchName MrchPrice

AP1 T-shirt $11
AP2 Hat $9
EQ1 Jump Rope $12

PASSCATEGORY

PassCatId CatName Price

PSA Adult $20
PSS Senior $10
PSK Kid $3

SOLDVIA

STrid MrchID Quatity

11111 AP1 1
11112 AP2 1
11112 AP2 1
11113 EQ1 3

Note: MEMEBERS can bring in
non-member guests. For each non-
member guest, a member buys
a one-day-guest-pass of a certain
pass category.

SALESTRANSACT

STrid Date Membid

11111 1-Jan-04 333
11112 2-Jan-04 222
11113 3-Jan-04 111

DATA INSTANCES 4

ANOTHER DATA SOURCE
SPECIALEVENT

CorpCustID CorpCustNameLoc EventTypeCode EvetType EventDate AmountCharged

CC1 Sears, Chicago 60640 L-A All Day Rental, January 4, 2004 $3500
CC2 Boeing, Chicago 60611 L-H Half Day Rental, January 5, 2004 $2200

ANOTHER DATA SOURCE 5

SOLUTION: STAR SCHEMA

CalendarKey

Day

Month

Quarter

Year

Calender

ItemKey

ItemId

Category

Type

Item

CalendarKey

CustomerKey

ItemKey

Amount

Quantity

Revenue

CustomerKey

CustomerId

CustName

CustCategory

Zip

Customer

SOLUTION: STAR SCHEMA 6

POPULATED TABLES

CALENDAR

CalendarKey Day Month Quarter Y ear

1 1 Jan 1 2004
2 2 Jan 1 2004
3 3 Jan 1 2004
4 4 Jan 1 2004
5 5 Jan 1 2004

ITEM

ItemKey ItemId Category Type

1 M1 Memship Platinum
2 M2 Memship Gold
3 M3 Memship Value
4 PSA OneDayP. Adult
5 PSS OneDayP. Senior
6 PSK OneDayP. Kid
7 AP1 Mrch. T-Shirt
8 AP2 Mrch. Hat
9 EQ1 Mrch. Jump Rope

10 L-A Spec. Evnt All Day
11 L-H Spec. Evnt Half Day

REVENUE

CalendarKey CustKey ItemKey Amount Quantity

1 1 1 $1000 1
1 2 3 $300 1
1 3 3 $300 1
1 1 4 $20 1
1 3 4 $20 1
2 3 6 $3 1
1 3 7 $11 1
1 3 8 $9 1
2 2 8 $9 1
3 1 9 $36 3
4 4 10 $3500 1
5 5 11 $2200 1

CUSTOMER

CustKey CustId CustName CustCategory Zip

1 111 Joe Ind 60611
2 222 Mary Ind 60640
3 333 Sue Ind 60611
4 CC1 Sears Corp 60640
5 CC2 Boeing Corp 60611

POPULATED TABLES 7

➀ See above. Note that this is not the unique answer.

➁ There are several tasks involved when importing the data into the data

warehouse. E.g., we need to extract zipcode information from

CorpCustNameLoc; we need to perform aggregation

(price ·Quantity) for tuples in the merchandise table; we might also

need to deal with (near) duplicate object detection (e.g., the same

“member” that appear in two data sources).

➂ “Find the percentage of revenue generated by members in the last

year” can be easily answered on the star schema by two aggregate

queries on the fact table. Specifically, if the complete data cube has

been built, the queries can be efficiently answered by the cuboid

(Y ear), and the cuboid (Y ear, Category).

➃ Since CustName is not likely to be a good ”level” for analysis (rather,

it is a descriptive attribute), there are 4 levels on Calendar dimension,

3 on Item, and 3 on Customer. Therefore, there are

(4 + 1) ∗ (3 + 1) ∗ (3 + 1) = 80 in total.

Note that we could have different hierarchies on a dimension. E.g., we

8

could consider the hierarchy on the Customer dimension one of the

following. They have different semantics, but do not affect the number

of cuboids.

ALL

Category

Zipcode

Customer

ALL

Category Zipcode

Customer

9

Q2

Consider the star schema below

➜ Write an MDX query that display total DollasSold for each product

category and each store in the State ’CA’.

➜ Write an MDX query that display total DollasSold for each product

category and each store in the State ’CA’ in 2007.

➜ create a star schema that has Month and Region as the finest

granularity on the corresponding dimensions. Show all tables in the

new data model populated with the data based on the data from the

original model.

Q2 10

CalendarKey

Day

Month

Quarter

Year

Calender

StoreKey

StoreId

State

Region

Store

CalendarKey

StoreKey

ProductKey

CustomerKey

DollarsSold

UnitsSold

Sales

ProductKey

Name

Type

Category

Product

CustomerKey

CustomerId

Gender

Customer

Q2 11

POPULATED TABLES
CALENDAR

CalendarKey Day Month Quarter Y ear

1 1 Jan 1 2003
2 2 Jan 1 2003
3 1 Feb 1 2003

PRODUCT

ProKey ProName ProType Category

1 Luvs 50 Diapers Infant Care
2 Huggies 24 Diapers Infant Care
3 High C Vitamin Dietary Supp

STORE

StoreKey StoreID State Region

1 X1 Maine East
2 X2 New Jersey East
3 Y1 Ohio Midwest

CUSTOMER

CustKey CustID Gender

1 12 Male
2 23 Male
3 34 Female

SALES

CalendarKey ProKey StoreKey CustKey $Sold UnitsSold

1 1 1 1 $15 1
1 2 2 2 $20 1
1 2 1 1 $40 2
1 2 2 1 $20 1
2 2 1 1 $19 1
2 2 2 1 $19 1
3 3 3 1 $9 2
3 3 3 2 $9 1
3 3 3 3 $9 1

POPULATED TABLES 12

SOLUTION: MDX QUERY

SELECT [Product ].[ Category ]. MEMBERS ON COLUMNS

[Store ].[USA ].[CA]. CHILDREN ON ROWS

FROM [Sales]

WHERE ([ Measures ].[ DollasSold ])

SELECT [Product ].[ Category ]. MEMBERS ON COLUMNS

[Store ].[USA ].[CA]. CHILDREN ON ROWS

FROM [Sales]

WHERE ([Time ].[2007] , [Measures ].[ DollasSold ])

SOLUTION: MDX QUERY 13

SOLUTION: STAR SCHEMA

CalendarKey

Month

Quarter

Year

Calender

ItemKey

Region

Store

CalendarKey

StoreKey

ProductKey

CustomerKey

DollarsSold

UnitsSold

Sales

ProductKey

Name

Type

Category

Product

CustomerKey

CustomerId

Gender

Customer

SOLUTION: STAR SCHEMA 14

POPULATED TABLES

CALENDAR

CalendarKey Month Quarter Y ear

1 Jan 1 2003
2 Feb 1 2003

PRODUCT

ProKey ProName ProType Category

1 Luvs 50 Diapers Infant Care
2 Huggies 24 Diapers Infant Care
3 High C Vitamin Dietary Supp

STORE

StoreKey Region

1 East
2 Midwest

CUSTOMER

CustKey CustID Gender

1 12 Male
2 23 Male
3 34 Female

SALES (← Intermediate)

CalendarKey ProKey StoreKey CustKey $Sold UnitsSold

1 1 1 1 $15 1
1 2 1 2 $20 1
1 2 1 1 $40 2
1 2 1 1 $20 1
1 2 1 1 $19 1
1 2 1 1 $19 1
2 3 2 1 $9 2
2 3 2 2 $9 1
2 3 2 3 $9 1

POPULATED TABLES 15

POPULATED TABLES
SALES

CalendarKey ProKey StoreKey CustKey $Sold UnitsSold

1 1 1 1 $15 1
1 2 1 2 $20 1
1 2 1 1 $98 5
2 3 2 1 $9 2
2 3 2 2 $9 1
2 3 2 3 $9 1

POPULATED TABLES 16