程序代写代做代考 ER flex Seminar 1

Seminar 1

Lecture 4
Advanced Dimensional Modelling
Marten Risius
E-mail: m.risius@business.uq.edu.au
Room: 516 Joyce Ackroyd (37) Building

‹#›

Recap: Business Analytics Framework

‹#›

2

Recap: Business Analyst World
How much revenue did the product G generate in the last three months, broken down by month for the south eastern sales region, by individual stores, broken down by promotions, compared to estimates and to the previous version of the product
Analysis starts usually with a single indication of something strange, then goes deep into the data, left to a new dimension, right to another, up to the summary, back down and left and right again, until the problem is identified…

‹#›

3

Recap: Star Schema
Sales Summary
(Fact Table)
Product

Customer
Retail
Outlet
Time
“WHEN” dimension
“WHO” dimension
“WHAT” dimension
“WHERE” dimension

‹#›
Recap: Movie Library ER Model

‹#›
Recap: Movie Library Star Model

‹#›
Learning Objectives
By the end of this session, you should be able to

develop star schemas based on ER models
understand snowflake schemas
explain and implement advance dimensional modelling concepts

‹#›
Dimensional Modelling (Moody and Kortink)
From ER Models to Star Schemas

‹#›
Dimensional model as an ER model
Sale
Time key
Store key
Customer key
Product key
Dollar sales
Unit sales
Customer
Customer key
Name
Customer type

Store
Store key
Address
Region
Time
Time key
Day
Month

Product
Product key
Product type
Product group
Product sub-group
weight

‹#›
Embedded Hierarchies in Dimensional Tables
Customer
Customer-id
Customer name
Market segment
Market sector
Industry class
Industry sector
Industry group
City
State
Country

Dimensional “Product” table/entity with hierarchical relationships
Logical view of customer entity
with hierarchical relationships

‹#›
Embedded Hierarchies in Dimensional Tables
Customer
Market
Segment
Industry
Class
City
Industry
Sector
Industry
Group
Market
Sector
State
Country

Market segment hierarchy
Location hierarchy
Industry hierarchy
Customer-id
Customer name
Market segment
Market sector
Industry class
Industry sector
Industry group
City
State
Country

‹#›
Snowflake Schema

‹#›
Why Dimensional Modelling Works
Chunking
Human short-term memory limits – 7 plus or minus 2 concepts
Large amounts of information are “chunked” by humans
In dimensional modelling, each star schema is a chunk

‹#›
Chunking Example

Equivalent Star Schemas

Normalised ER Model

‹#›
Why Dimensional Modelling Works
Hierarchical Structuring
Hierarchical structures are used to manage complexity
Each dimension in a star schema typically consists of one of more hierarchies
The hierarchical structure provides the means to “roll up” and “drill down” in OLAP tools
Customer
Market
Segment
Industry
Class
City
Industry
Sector
Industry
Group
Market
Sector
State
Country

Market segment hierarchy
Location hierarchy
Industry hierarchy
Customer-id
Customer name
Market segment
Market sector
Industry class
Industry sector
Industry group
City
State
Country

‹#›
Dimensional Modelling and ER Modelling
A star schema is just a restricted form of an ER model
Fact table
An ‘intersection entity’ with complex key and measures
Dimension tables
‘Master data’ with hierarchies
Simple keys (create surrogate keys)

‹#›
Deriving Dimensional Models from
ER Models
Classify entities
High-level star schema design
Detailed fact table design
Detailed dimensional table design

‹#›
An Example ER Model

‹#›
1. Classify Entities
Classify entities into three types
Transaction entities (business events)
Eg. orders, shipments, payments …
Component entities
Who, what, where, how and why data about the events
Classification entities
Define embedded hierarchies in components

‹#›
Transaction Entities

‹#›
Component Entities

‹#›
Classification Entities

‹#›
2. High-level Star Schema Design
Identity Star schemas required
Each transaction entity is a candidate
Define level of summarisation
Level of granularity
Storage space / flexibility trade-off
Identify relevant dimensions
Select relevant dimensions from component entities

‹#›
Identifying Star Schemas
When transaction entities are related in a master-detail structure, merge to form a single star schema – the master entity becomes degenerate dimension

Order

Order
Item
consists of
Order no
Date
Customer no
Order no
Product ID
Quantity
Unit Price

Order
Fact
Order no
Product ID
Quantity
Quantity*Unit Price

‹#›
Star Schema Design

‹#›
time
25

3. Detailed Fact Table Design
Define key
A composite key consisting of the keys of all dimension tables (not minimal in most cases)
Or transaction level key (such as combination of orderID and OrderLineItemID) for lowest level of granulairty
Define facts
The measures that can be analysed using numerical functions
Use additive facts, but may also be semi additive

‹#›
Using Additive Facts
Wherever possible, additive facts should be used to prevent errors in queries

Order Item

Order No

Product ID

Order Quantity

Unit Price

Order Fact

Order No

Product ID

Order Quantity

Extended Price

Order Date

Retail Outlet ID

(Extended Price = Order Quantity * Unit Price)

Customer ID

‹#›
Fact Table Design
No data lost from original normalised model – master entity attributes should be included in detailed entity

Order
Fact

Delivery
Method

Customer

Date

Product
“WHO”
Dimension
“HOW”
Dimension
“WHAT”
Dimension
“WHEN”
Dimension

Employee
“WHO”
Dimension

order

posted

Retail Outlet

“WHERE”
Dimension

Order
“Degenerate”
Dimension

Facts:

Order Quantity

Extended Price ($)

Discount Amount ($)

Retail_Outlet_ID

Employee_Number

Delivery_Method

Order_Number

Customer_ID

Order
Fact

Product_ID

Order_Date

Posted_Date

Order_Quantity

Extended_Price

Discount_Amount

‹#›
4. Detailed Dimensional Table Design
Define dimensional key
Usually a simple numeric key (surrogate key)
May need to be generalised form underlying component key
Collapse hierarchies
De-normalise hierarchies into component entities
Introduces transitive dependencies and therefore redundancy

‹#›
Collapse Hierarchies

‹#›
Codes and Abbreviations
Replace codes and abbreviations by descriptive text
To make the star schema as understandable as possible

‹#›
Replace Codes with Text

Annual Revenue

Industry Class Code

industry Class Name

Customer_ID

Number of Employeees

Customer
Dimension

Customer Name

Industry Sector Code

Industry Sector Name

Industry Group Code

Industry Group Name

City Code

Date of First Order

Market Segment Code

Market Segment Name

Market Sector Code

City Name

State Code

State Name

Country Code

Country Name

Population

Market Sector Name

Annual Revenue

industry Class Name

Customer_ID

Number of Employeees

Customer
Dimension

Customer Name

Industry Sector Name

Industry Group Name

Date of First Order

Market Segment Name

City Name

State Name

Country Name

Population

Market Sector Name

Postal Area Name

Postal Area Name

Flat/Apartment Number

Street Number

Street Name

Street Type

Postcode

Postcode
X
X
X
X
X
X
X

‹#›
Final Design

Annual Revenue

industry Class Name

Customer_ID

Number of Employeees

Customer
Dimension

Customer Name

Industry Sector Name

Industry Group Name

Date of First Order

Market Segment Name

City Name

State Name

Country Name

Population

Market Sector Name

Postal Area Name

Retail_Outlet_ID

Employee_Number

Delivery_Method

Order_Number

Customer_ID

Order
Fact

Product_ID

Order_Date

Posted_Date

Order_Quantity

Total_Price

Discount_Amount

Day number in year

Quarter

Date_Key

Day number

Date
Dimension

Day name

Fiscal
Period

Year

Week number in year

Month name

Weekend

Event

Country Name

Season

Month number

Holiday

Product Subcategory

Unit Type

Product_ID

Brand

Product
Dimension

Product Name

Tax Exempt Status

Import Status

Product Category

Package Type

Package Size

Export Status

Sales Region

Retail_Outlet_ID

Store Type

Retail Outlet
Dimension

Retail Outlet Name

Sales District

City Name

State Name

Country Name

Population

Postal Area Name

Position Type

Last Performance Rating

Employee_ID

Employment Status

Employee
Dimension

Employee Name

Education Level

Age

Commencement Date

Salary Level

Union Status

Sex

Carrier Name

Delivery_Method_ID

Priority

Delivery
Method
Dimension

Delivery Method

Floor Space

Number of Employees

Parking Space

inventory Status

Store Manager

Product Manager

Account Manager

Flat/Apartment Number

Street Number

Street Name

Street Type

Postcode

Flat/Apartment Number

Street Number

Street Name

Street Type

Postcode

Source Entities:
Employee
Position Type
Employment Status
Education Level
Union Status

Departure Date

Source Entities:
Retail Outlet
Sales District
Sales Region
Postal Area
City
State
Country
Store Type

Source Entities:
Delivery Method
Carrier
Priority

Source Entities:
Customer
Postal Area
State
City
Country
Market Segment
Market Sector
Industry Sector
Industry Segment
Industry Class

Source Entities:
Product
Package Type
Tax Status
Brand
Product Category
Product Subcategory
Unit Type
Import Status
Export Status
Inventory Status

Source Entities:
Order
Order Item
Source Entities:
NONE
(corresponds to a
data type at the
operational level)

‹#›
From ER Models to Dimensional Models
Dimensional models are restricted ER models
Subsetting – partition ER model by events
Denormalisation
Collapse hierarchies to form dimension tables
Summarisation
Consider summarisation for performance requirements

‹#›
Advanced Dimensional Modelling

‹#›
Advanced Design Issues
Types of facts
Types of fact tables
Conformed dimensions
Slowly changing dimensions
Mini-dimensions

‹#›
Types of Facts
Additive measures can be summed across any of the dimensions associated with them.

Semi-additive measures can be summed across some dimensions, but not all;
balance amounts are common semi- additive facts because they are additive across all dimensions except time

Some measures are completely non- additive, such as ratios.

‹#›
Types of Fact Tables
Transaction (Quantity, Extended Price)
Most common
Usually additive measures

Snapshot (Balance, Transaction Count)
Periodic or accumulating view of business measures
Usually semi-additive measures

Factless
Event occurrence
No measures, just FKs

‹#›
General guidelines about fact tables typically encountered
Determined by aggregation level of measures: additive, semi-additive (some but not all dimensions), non additive
In reality, may have mix of different measures in a fact table.
Transaction:
Track details of transactions
Typical: sales, web page hits, purchases
Snapshot
Also called inventory level fact table
Examples: inventory, account (account balances), accounts receivable, accounts payable
Factless:
Occurrence of events
Examples: attendance (work, school, …), room reservation
38

Conformed Dimensions- Integrated DW
Sales

Customer
Invoice
Organization
Price Source
Time
Supplier
Product
A/R
Balances

Customer
Organization
Time

Inventory

Product
Movement
Organization
Time
Supplier
Product

‹#›
39

Conformed Dimensions- Integrated DW
Price Source
Sales
Time
Supplier
Product

Customer
Invoice
Organization

Product
Movement
Inventory

A/R
Balances

‹#›
40

Slowly Changing Dimensions
One of the most important issues in dimensional modelling
A slowly changing dimension is a dimension table in which an underlying entity changes some important attribute
For example the address of a customer

‹#›
Slowly Changing Dimensions
According to Kimball (1996) there are three standard approaches:
Overwrite old values
Create a new dimension record
Create a current value field
Creating a new dimension record is usually the best approach

‹#›
Slowly Changing Dimensions: New Record
Create a new dimension table record for each version
Determine which attributes need to be tracked
Generalise the key of the dimension table (eg. Add a version number to the original key)

Advantage: maintains and partitions history
Disadvantage: more complex than overwrite

‹#›

Slowly Changing Dimensions: New Record

‹#›
Overview of Slowly Changing Dimension Approaches
Approach Result Example
Type I Overwrite the old values in the dimension record History is lost Error, history doesn’t matter
Type II Create an additional dimension record with the new values. Add a “current” field. Provides complete history Customer address changes
Type III Create an “old” field to store the immediate previous value Segments history between the old and the new description Change in sales districts

‹#›
45

Mini-Dimensions
When dimension tables get too large (in number of rows) they become unmanageable for browsing
Most heavily used attributes can be separated out into a mini-dimension table
Can improve performance significantly for the most common queries

‹#›
Mini-Dimensions
A mini-dimension table should contain a subset of attributes that can be efficiently browsed:
Less than 100,000 possible combinations of attribute values (< 100,000 rows) Ways of reducing combinations: Use discretely valued attributes Group continuously valued attributes into bands ‹#› Mini-Dimensions Customer demographics separated out as mini-dimension: Number of employees and annual revenue converted to ranges Date of first order converted to years of service ‹#› What Makes a Good Dimensional Model? Simple: seven plus or minus two dimensions Understandable: user-oriented data names and values (descriptive text rather than codes) Complete: all relevant dimensions and facts Appropriate level of granularity Integrated: conformed dimensions Error proof: avoid errors in queries by using additive facts; not mixing levels of aggregation ‹#› What is Examinable: Snowflake schemas Different types of fact tables Different types of facts Slowly changing dimensions Conformed dimensions Surrogate keys ‹#› Next Seminar ‹#› Next Seminar Performance Dashboards and Information Delivery ‹#› Print Sales Area Customer Person Company Customer/ Customer Relationship Customer Contact Number Customer Address Address Town/SuburbPostcode Branch Region Customer Account Customer Communication Account Account Manager Staff Member Account Address Account Transaction Account Payment Account Adjustment Account Invoice Account Accrual Invoice Schedule Schedule Date Account Type Service Point Premise Bank Account Account Credit Status Account Arrears Action Credit Status Type Arrears Action Type Adjustment Reason Type Payment Method Transaction Type Account Charge Waived Street Street TypeState Melways Reference Contact Number Type Address Type Action Taken Communicati on Method Action Type Staff Contact Number Staff Address Communicati on Type includes Account Relationship Type sent by/ sent to received/ sent by relates to contains Gas Supply Cost Component Service Cost Type Gas Production Cost Gas Transmission Cost Gas Distribution Cost Distribution Network Transmission Network Gas Source Industry Division Industry Subdivision Industry Group Industry Class Service Point Industry Class Gas Supply Interruption Interruption Reason Type Invoice Segment Invoice Line reported as Payment Terms Meter Reading Payment Allocation Service Call Load Shedding Classification Equipment Item Equipment Type Reading Component Reading Type Meter Reading Schedule taken by referred to Service Call Charge Component Customer Relationship Type Equipment Category Manufacturer Gas Forecast performs consists of Customer Complaint Meter direct debit results in recommended action for performs role of Service Point Equipment consists of includes includes includes has includes Cost Centre head office location has sourced from distributed via transmitted via Sales Area Type Equipment Reading Type/Method Reading Method has Declared Heating Value Heating Value Area Pension type Complaint Type Complaint Resolution Type relates to used to locate includes response to manager ot includesincludes Reading Reason Type transmitted via has Regulator Sales Area Customer Person Company Customer/ Customer Relationship Customer Contact Number Customer Address Address Town/SuburbPostcode Branch Region Customer Account Customer Communication Account Account Manager Staff Member Account Address Account Transaction Account Payment Account Adjustment Account Invoice Account Accrual Invoice Schedule Schedule Date Account Type Service Point Premise Bank Account Account Credit Status Account Arrears Action Credit Status Type Arrears Action Type Adjustment Reason Type Payment Method Transaction Type Account Charge Waived Street Street Type State Melways Reference Contact Number Type Address Type Action Taken Communicati on Method Action Type Staff Contact Number Staff Address Communicati on Type includes Account Relationship Type sent by/ sent to received/ sent by relates to contains Gas Supply Cost Component Service Cost Type Gas Production Cost Gas Transmission Cost Gas Distribution Cost Distribution Network Transmission Network Gas Source Industry Division Industry Subdivision Industry Group Industry Class Service Point Industry Class Gas Supply Interruption Interruption Reason Type Invoice Segment Invoice Line reported as Payment Terms Meter Reading Payment Allocation Service Call Load Shedding Classification Equipment Item Equipment Type Reading Component Reading Type Meter Reading Schedule taken by referred to Service Call Charge Component Customer Relationship Type Equipment Category Manufacturer Gas Forecast performs consists of Customer Complaint Meter direct debit results in recommended action for performs role of Service Point Equipment consists of includesincludes includes has includes Cost Centre head office location has sourced from distributed via transmitted via Sales Area Type Equipment Reading Type/Method Reading Method has Declared Heating Value Heating Value Area Pension type Complaint Type Complaint Resolution Type relates to used to locate includes response to manager ot includes includes Reading Reason Type transmitted via has Regulator Sales Summary (Fact Table) Product Customer Date Retail Outlet Product Order Customer Market Segment Sales District Store Type City Retail Outlet Order Item Stock Level Warehouse Customer Contact Number Product AvailabilitySupplier Market Sector Brand Package Type Postal Area Sales Region Industry Type Industry Class Industry GroupProduct Substitution Employee Reporting Relationship Position Type Contact Number Type Delivery Method Carrier State Country Priority Credit Terms Product Subcategory Product Category Tax Status Delivery Charge Employment Status Union Status Education Level Inventory Status Import/ Export Status Product Order Customer Market Segment Sales District Store Type City Retail Outlet Order Item Stock Level Warehouse Customer Contact Number Product AvailabilitySupplier Market Sector Brand Package Type Postal Area Sales Region Industry Type Industry Class Industry Group Product Substitution Employee Reporting Relationship Position Type Contact Number Type Delivery Method Carrier State Country Priority Credit Terms Product Subcategory Product Category Tax Status Delivery Charge Employment Status Union Status Education Level Inventory Status Import/ Export Status Product Order Customer Market Segment Sales District Store Type City Retail Outlet Order Item Stock Level Warehouse Customer Contact Number Product AvailabilitySupplier Market Sector Brand Package Type Postal Area Sales Region Industry Type Industry Class Industry Group Product Substitution Employee Reporting Relationship Position Type Contact Number Type Delivery Method Carrier State Country Priority Credit Terms Product Subcategory Product Category Tax Status Delivery Charge Employment Status Union Status Education Level Inventory Status Import/ Export Status Product Order Customer Market Segment Sales District Store Type City Retail Outlet Order Item Stock Level Warehouse Customer Contact Number Product AvailabilitySupplier Market Sector Brand Package Type Postal Area Sales Region Industry Type Industry Class Industry Group Product Substitution Employee Reporting Relationship Position Type Contact Number Type Delivery Method Carrier State Country Priority Credit Terms Product Subcategory Product Category Tax Status Delivery Charge Employment Status Union Status Education Level Inventory Status Import/ Export Status Customer Industry Class Industry Sector Industry Group CityStateCountry Market Segment Industry Hierarchy Location Hierarchy Market Hierarchy Annual Revenue Industry Class Code industry Class Name Customer_ID Number of Employeees Customer Dimension Customer Name Industry Sector Code Industry Sector Name Industry Group Code Industry Group Name City Code Date of First Order Market Segment Code Market Segment Name Market Sector Code City Name State Code State Name Country Code Country Name Population Market Sector Postal Area Market Sector Name Postal Code Postal Area Name /docProps/thumbnail.jpeg