Seminar 1
Lecture 4
Advanced Dimensional Modelling
Marten Risius
E-mail: m.risius@business.uq.edu.au
Room: 516 Joyce Ackroyd (37) Building
‹#›
Recap: Business Analytics Framework
‹#›
2
Recap: Business Analyst World
How much revenue did the product G generate in the last three months, broken down by month for the south eastern sales region, by individual stores, broken down by promotions, compared to estimates and to the previous version of the product
Analysis starts usually with a single indication of something strange, then goes deep into the data, left to a new dimension, right to another, up to the summary, back down and left and right again, until the problem is identified…
‹#›
3
Recap: Star Schema
Sales Summary
(Fact Table)
Product
Customer
Retail
Outlet
Time
“WHEN” dimension
“WHO” dimension
“WHAT” dimension
“WHERE” dimension
‹#›
Recap: Movie Library ER Model
‹#›
Recap: Movie Library Star Model
‹#›
Learning Objectives
By the end of this session, you should be able to
develop star schemas based on ER models
understand snowflake schemas
explain and implement advance dimensional modelling concepts
‹#›
Dimensional Modelling (Moody and Kortink)
From ER Models to Star Schemas
‹#›
Dimensional model as an ER model
Sale
Time key
Store key
Customer key
Product key
Dollar sales
Unit sales
Customer
Customer key
Name
Customer type
Store
Store key
Address
Region
Time
Time key
Day
Month
Product
Product key
Product type
Product group
Product sub-group
weight
‹#›
Embedded Hierarchies in Dimensional Tables
Customer
Customer-id
Customer name
Market segment
Market sector
Industry class
Industry sector
Industry group
City
State
Country
Dimensional “Product” table/entity with hierarchical relationships
Logical view of customer entity
with hierarchical relationships
‹#›
Embedded Hierarchies in Dimensional Tables
Customer
Market
Segment
Industry
Class
City
Industry
Sector
Industry
Group
Market
Sector
State
Country
Market segment hierarchy
Location hierarchy
Industry hierarchy
Customer-id
Customer name
Market segment
Market sector
Industry class
Industry sector
Industry group
City
State
Country
‹#›
Snowflake Schema
‹#›
Why Dimensional Modelling Works
Chunking
Human short-term memory limits – 7 plus or minus 2 concepts
Large amounts of information are “chunked” by humans
In dimensional modelling, each star schema is a chunk
‹#›
Chunking Example
Equivalent Star Schemas
Normalised ER Model
‹#›
Why Dimensional Modelling Works
Hierarchical Structuring
Hierarchical structures are used to manage complexity
Each dimension in a star schema typically consists of one of more hierarchies
The hierarchical structure provides the means to “roll up” and “drill down” in OLAP tools
Customer
Market
Segment
Industry
Class
City
Industry
Sector
Industry
Group
Market
Sector
State
Country
Market segment hierarchy
Location hierarchy
Industry hierarchy
Customer-id
Customer name
Market segment
Market sector
Industry class
Industry sector
Industry group
City
State
Country
‹#›
Dimensional Modelling and ER Modelling
A star schema is just a restricted form of an ER model
Fact table
An ‘intersection entity’ with complex key and measures
Dimension tables
‘Master data’ with hierarchies
Simple keys (create surrogate keys)
‹#›
Deriving Dimensional Models from
ER Models
Classify entities
High-level star schema design
Detailed fact table design
Detailed dimensional table design
‹#›
An Example ER Model
‹#›
1. Classify Entities
Classify entities into three types
Transaction entities (business events)
Eg. orders, shipments, payments …
Component entities
Who, what, where, how and why data about the events
Classification entities
Define embedded hierarchies in components
‹#›
Transaction Entities
‹#›
Component Entities
‹#›
Classification Entities
‹#›
2. High-level Star Schema Design
Identity Star schemas required
Each transaction entity is a candidate
Define level of summarisation
Level of granularity
Storage space / flexibility trade-off
Identify relevant dimensions
Select relevant dimensions from component entities
‹#›
Identifying Star Schemas
When transaction entities are related in a master-detail structure, merge to form a single star schema – the master entity becomes degenerate dimension
Order
Order
Item
consists of
Order no
Date
Customer no
Order no
Product ID
Quantity
Unit Price
Order
Fact
Order no
Product ID
Quantity
Quantity*Unit Price
‹#›
Star Schema Design
‹#›
time
25
3. Detailed Fact Table Design
Define key
A composite key consisting of the keys of all dimension tables (not minimal in most cases)
Or transaction level key (such as combination of orderID and OrderLineItemID) for lowest level of granulairty
Define facts
The measures that can be analysed using numerical functions
Use additive facts, but may also be semi additive
‹#›
Using Additive Facts
Wherever possible, additive facts should be used to prevent errors in queries
Order Item
Order No
Product ID
Order Quantity
Unit Price
Order Fact
Order No
Product ID
Order Quantity
Extended Price
Order Date
Retail Outlet ID
(Extended Price = Order Quantity * Unit Price)
Customer ID
‹#›
Fact Table Design
No data lost from original normalised model – master entity attributes should be included in detailed entity
Order
Fact
Delivery
Method
Customer
Date
Product
“WHO”
Dimension
“HOW”
Dimension
“WHAT”
Dimension
“WHEN”
Dimension
Employee
“WHO”
Dimension
order
posted
Retail Outlet
“WHERE”
Dimension
Order
“Degenerate”
Dimension
Facts:
・
Order Quantity
・
Extended Price ($)
・
Discount Amount ($)
Retail_Outlet_ID
Employee_Number
Delivery_Method
Order_Number
Customer_ID
Order
Fact
Product_ID
Order_Date
Posted_Date
Order_Quantity
Extended_Price
Discount_Amount
‹#›
4. Detailed Dimensional Table Design
Define dimensional key
Usually a simple numeric key (surrogate key)
May need to be generalised form underlying component key
Collapse hierarchies
De-normalise hierarchies into component entities
Introduces transitive dependencies and therefore redundancy
‹#›
Collapse Hierarchies
‹#›
Codes and Abbreviations
Replace codes and abbreviations by descriptive text
To make the star schema as understandable as possible
‹#›
Replace Codes with Text
Annual Revenue
Industry Class Code
industry Class Name
Customer_ID
Number of Employeees
Customer
Dimension
Customer Name
Industry Sector Code
Industry Sector Name
Industry Group Code
Industry Group Name
City Code
Date of First Order
Market Segment Code
Market Segment Name
Market Sector Code
City Name
State Code
State Name
Country Code
Country Name
Population
Market Sector Name
Annual Revenue
industry Class Name
Customer_ID
Number of Employeees
Customer
Dimension
Customer Name
Industry Sector Name
Industry Group Name
Date of First Order
Market Segment Name
City Name
State Name
Country Name
Population
Market Sector Name
Postal Area Name
Postal Area Name
Flat/Apartment Number
Street Number
Street Name
Street Type
Postcode
Postcode
X
X
X
X
X
X
X
‹#›
Final Design
Annual Revenue
industry Class Name
Customer_ID
Number of Employeees
Customer
Dimension
Customer Name
Industry Sector Name
Industry Group Name
Date of First Order
Market Segment Name
City Name
State Name
Country Name
Population
Market Sector Name
Postal Area Name
Retail_Outlet_ID
Employee_Number
Delivery_Method
Order_Number
Customer_ID
Order
Fact
Product_ID
Order_Date
Posted_Date
Order_Quantity
Total_Price
Discount_Amount
Day number in year
Quarter
Date_Key
Day number
Date
Dimension
Day name
Fiscal
Period
Year
Week number in year
Month name
Weekend
Event
Country Name
Season
Month number
Holiday
Product Subcategory
Unit Type
Product_ID
Brand
Product
Dimension
Product Name
Tax Exempt Status
Import Status
Product Category
Package Type
Package Size
Export Status
Sales Region
Retail_Outlet_ID
Store Type
Retail Outlet
Dimension
Retail Outlet Name
Sales District
City Name
State Name
Country Name
Population
Postal Area Name
Position Type
Last Performance Rating
Employee_ID
Employment Status
Employee
Dimension
Employee Name
Education Level
Age
Commencement Date
Salary Level
Union Status
Sex
Carrier Name
Delivery_Method_ID
Priority
Delivery
Method
Dimension
Delivery Method
Floor Space
Number of Employees
Parking Space
inventory Status
Store Manager
Product Manager
Account Manager
Flat/Apartment Number
Street Number
Street Name
Street Type
Postcode
Flat/Apartment Number
Street Number
Street Name
Street Type
Postcode
Source Entities:
Employee
Position Type
Employment Status
Education Level
Union Status
Departure Date
Source Entities:
Retail Outlet
Sales District
Sales Region
Postal Area
City
State
Country
Store Type
Source Entities:
Delivery Method
Carrier
Priority
Source Entities:
Customer
Postal Area
State
City
Country
Market Segment
Market Sector
Industry Sector
Industry Segment
Industry Class
Source Entities:
Product
Package Type
Tax Status
Brand
Product Category
Product Subcategory
Unit Type
Import Status
Export Status
Inventory Status
Source Entities:
Order
Order Item
Source Entities:
NONE
(corresponds to a
data type at the
operational level)
‹#›
From ER Models to Dimensional Models
Dimensional models are restricted ER models
Subsetting – partition ER model by events
Denormalisation
Collapse hierarchies to form dimension tables
Summarisation
Consider summarisation for performance requirements
‹#›
Advanced Dimensional Modelling
‹#›
Advanced Design Issues
Types of facts
Types of fact tables
Conformed dimensions
Slowly changing dimensions
Mini-dimensions
‹#›
Types of Facts
Additive measures can be summed across any of the dimensions associated with them.
Semi-additive measures can be summed across some dimensions, but not all;
balance amounts are common semi- additive facts because they are additive across all dimensions except time
Some measures are completely non- additive, such as ratios.
‹#›
Types of Fact Tables
Transaction (Quantity, Extended Price)
Most common
Usually additive measures
Snapshot (Balance, Transaction Count)
Periodic or accumulating view of business measures
Usually semi-additive measures
Factless
Event occurrence
No measures, just FKs
‹#›
General guidelines about fact tables typically encountered
Determined by aggregation level of measures: additive, semi-additive (some but not all dimensions), non additive
In reality, may have mix of different measures in a fact table.
Transaction:
Track details of transactions
Typical: sales, web page hits, purchases
Snapshot
Also called inventory level fact table
Examples: inventory, account (account balances), accounts receivable, accounts payable
Factless:
Occurrence of events
Examples: attendance (work, school, …), room reservation
38
Conformed Dimensions- Integrated DW
Sales
Customer
Invoice
Organization
Price Source
Time
Supplier
Product
A/R
Balances
Customer
Organization
Time
Inventory
Product
Movement
Organization
Time
Supplier
Product
‹#›
39
Conformed Dimensions- Integrated DW
Price Source
Sales
Time
Supplier
Product
Customer
Invoice
Organization
Product
Movement
Inventory
A/R
Balances
‹#›
40
Slowly Changing Dimensions
One of the most important issues in dimensional modelling
A slowly changing dimension is a dimension table in which an underlying entity changes some important attribute
For example the address of a customer
‹#›
Slowly Changing Dimensions
According to Kimball (1996) there are three standard approaches:
Overwrite old values
Create a new dimension record
Create a current value field
Creating a new dimension record is usually the best approach
‹#›
Slowly Changing Dimensions: New Record
Create a new dimension table record for each version
Determine which attributes need to be tracked
Generalise the key of the dimension table (eg. Add a version number to the original key)
Advantage: maintains and partitions history
Disadvantage: more complex than overwrite
‹#›
Slowly Changing Dimensions: New Record
‹#›
Overview of Slowly Changing Dimension Approaches
Approach Result Example
Type I Overwrite the old values in the dimension record History is lost Error, history doesn’t matter
Type II Create an additional dimension record with the new values. Add a “current” field. Provides complete history Customer address changes
Type III Create an “old” field to store the immediate previous value Segments history between the old and the new description Change in sales districts
‹#›
45
Mini-Dimensions
When dimension tables get too large (in number of rows) they become unmanageable for browsing
Most heavily used attributes can be separated out into a mini-dimension table
Can improve performance significantly for the most common queries
‹#›
Mini-Dimensions
A mini-dimension table should contain a subset of attributes that can be efficiently browsed:
Less than 100,000 possible combinations of attribute values (< 100,000 rows)
Ways of reducing combinations:
Use discretely valued attributes
Group continuously valued attributes into bands
‹#›
Mini-Dimensions
Customer demographics separated out as mini-dimension:
Number of employees and annual revenue converted to ranges
Date of first order converted to years of service
‹#›
What Makes a Good Dimensional Model?
Simple: seven plus or minus two dimensions
Understandable: user-oriented data names and values (descriptive text rather than codes)
Complete: all relevant dimensions and facts
Appropriate level of granularity
Integrated: conformed dimensions
Error proof: avoid errors in queries by using additive facts; not mixing levels of aggregation
‹#›
What is Examinable:
Snowflake schemas
Different types of fact tables
Different types of facts
Slowly changing dimensions
Conformed dimensions
Surrogate keys
‹#›
Next Seminar
‹#›
Next Seminar
Performance Dashboards and Information Delivery
‹#›
Print
Sales Area
Customer
Person
Company
Customer/
Customer
Relationship
Customer
Contact
Number
Customer
Address
Address
Town/SuburbPostcode
Branch
Region
Customer
Account
Customer
Communication
Account
Account
Manager
Staff Member
Account
Address
Account
Transaction
Account
Payment
Account
Adjustment
Account
Invoice
Account
Accrual
Invoice
Schedule
Schedule
Date
Account
Type
Service Point
Premise
Bank
Account
Account
Credit Status
Account
Arrears Action
Credit Status
Type
Arrears Action
Type
Adjustment
Reason
Type
Payment
Method
Transaction
Type
Account
Charge
Waived
Street
Street TypeState Melways
Reference
Contact
Number Type
Address Type
Action Taken
Communicati
on Method
Action Type
Staff Contact
Number
Staff Address
Communicati
on Type
includes
Account
Relationship
Type
sent by/
sent to
received/
sent by
relates to
contains
Gas Supply
Cost
Component
Service Cost
Type
Gas
Production
Cost
Gas
Transmission
Cost
Gas
Distribution
Cost
Distribution
Network
Transmission
Network
Gas Source
Industry
Division
Industry
Subdivision
Industry
Group
Industry Class
Service Point
Industry Class
Gas Supply
Interruption
Interruption
Reason Type
Invoice
Segment Invoice Line
reported as
Payment
Terms
Meter
Reading
Payment
Allocation
Service Call
Load
Shedding
Classification
Equipment Item
Equipment
Type
Reading
Component
Reading Type
Meter
Reading
Schedule
taken by
referred to
Service Call
Charge
Component
Customer
Relationship
Type
Equipment
Category
Manufacturer
Gas Forecast
performs
consists of
Customer
Complaint
Meter
direct debit
results in
recommended action
for
performs role of
Service Point
Equipment
consists of
includes includes includes
has
includes
Cost Centre
head office
location
has
sourced from
distributed via
transmitted via
Sales Area
Type
Equipment
Reading
Type/Method
Reading
Method
has
Declared
Heating
Value
Heating
Value Area
Pension type
Complaint
Type
Complaint
Resolution
Type
relates to
used to locate
includes
response to
manager ot includesincludes
Reading
Reason Type
transmitted via
has
Regulator
Sales Area
Customer
Person
Company
Customer/
Customer
Relationship
Customer
Contact
Number
Customer
Address
Address
Town/SuburbPostcode
Branch
Region
Customer
Account
Customer
Communication
Account
Account
Manager
Staff Member
Account
Address
Account
Transaction
Account
Payment
Account
Adjustment
Account
Invoice
Account
Accrual
Invoice
Schedule
Schedule
Date
Account
Type
Service Point
Premise
Bank
Account
Account
Credit Status
Account
Arrears Action
Credit Status
Type
Arrears Action
Type
Adjustment
Reason
Type
Payment
Method
Transaction
Type
Account
Charge
Waived
Street
Street Type
State
Melways
Reference
Contact
Number Type
Address Type
Action Taken
Communicati
on Method
Action Type
Staff Contact
Number
Staff Address
Communicati
on Type
includes
Account
Relationship
Type
sent by/
sent to
received/
sent by
relates to
contains
Gas Supply
Cost
Component
Service Cost
Type
Gas
Production
Cost
Gas
Transmission
Cost
Gas
Distribution
Cost
Distribution
Network
Transmission
Network
Gas Source
Industry
Division
Industry
Subdivision
Industry
Group
Industry Class
Service Point
Industry Class
Gas Supply
Interruption
Interruption
Reason Type
Invoice
Segment
Invoice Line
reported as
Payment
Terms
Meter
Reading
Payment
Allocation
Service Call
Load
Shedding
Classification
Equipment Item
Equipment
Type
Reading
Component
Reading Type
Meter
Reading
Schedule
taken by
referred to
Service Call
Charge
Component
Customer
Relationship
Type
Equipment
Category
Manufacturer
Gas Forecast
performs
consists of
Customer
Complaint
Meter
direct debit
results in
recommended action
for
performs role of
Service Point
Equipment
consists of
includesincludes
includes
has
includes
Cost Centre
head office
location
has
sourced from
distributed via
transmitted via
Sales Area
Type
Equipment
Reading
Type/Method
Reading
Method
has
Declared
Heating
Value
Heating
Value Area
Pension type
Complaint
Type
Complaint
Resolution
Type
relates to
used to locate
includes
response to
manager ot
includes
includes
Reading
Reason Type
transmitted via
has
Regulator
Sales
Summary
(Fact Table)
Product
Customer
Date
Retail
Outlet
Product
Order Customer Market Segment
Sales District Store Type City
Retail Outlet
Order Item
Stock Level Warehouse
Customer
Contact
Number
Product
AvailabilitySupplier
Market Sector
Brand
Package Type
Postal Area
Sales Region
Industry Type Industry Class Industry GroupProduct
Substitution
Employee
Reporting
Relationship
Position Type
Contact
Number
Type
Delivery
Method Carrier
State Country
Priority
Credit Terms
Product
Subcategory
Product
Category
Tax Status
Delivery
Charge
Employment
Status
Union Status
Education
Level
Inventory
Status
Import/
Export Status
Product
Order Customer Market Segment
Sales District Store Type City
Retail Outlet
Order Item
Stock Level Warehouse
Customer
Contact
Number
Product
AvailabilitySupplier
Market Sector
Brand
Package Type
Postal Area
Sales Region
Industry Type Industry Class Industry Group
Product
Substitution
Employee
Reporting
Relationship
Position Type
Contact
Number
Type
Delivery
Method Carrier
State Country
Priority
Credit Terms
Product
Subcategory
Product
Category
Tax Status
Delivery
Charge
Employment
Status
Union Status
Education
Level
Inventory
Status
Import/
Export Status
Product
Order Customer Market Segment
Sales District Store Type City
Retail
Outlet
Order Item
Stock Level Warehouse
Customer
Contact
Number
Product
AvailabilitySupplier
Market Sector
Brand
Package Type
Postal Area
Sales Region
Industry Type Industry Class Industry Group
Product
Substitution
Employee
Reporting
Relationship
Position Type
Contact
Number
Type
Delivery
Method Carrier
State Country
Priority
Credit Terms
Product
Subcategory
Product
Category
Tax Status
Delivery
Charge
Employment
Status
Union Status
Education
Level
Inventory
Status
Import/
Export Status
Product
Order Customer Market Segment
Sales District Store Type City
Retail
Outlet
Order Item
Stock Level Warehouse
Customer
Contact
Number
Product
AvailabilitySupplier
Market Sector
Brand
Package Type
Postal Area
Sales Region
Industry Type Industry Class Industry Group
Product
Substitution
Employee
Reporting
Relationship
Position Type
Contact
Number
Type
Delivery
Method Carrier
State Country
Priority
Credit Terms
Product
Subcategory
Product
Category
Tax Status
Delivery
Charge
Employment
Status
Union Status
Education
Level
Inventory
Status
Import/
Export Status
Customer
Industry
Class
Industry
Sector
Industry
Group
CityStateCountry
Market
Segment
Industry
Hierarchy
Location
Hierarchy
Market
Hierarchy Annual Revenue
Industry Class Code
industry Class Name
Customer_ID
Number of Employeees
Customer
Dimension
Customer Name
Industry Sector Code
Industry Sector Name
Industry Group Code
Industry Group Name
City Code
Date of First Order
Market Segment Code
Market Segment Name
Market Sector Code
City Name
State Code
State Name
Country Code
Country Name
Population
Market
Sector
Postal Area
Market Sector Name
Postal Code
Postal Area Name
/docProps/thumbnail.jpeg