INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 19 Data Warehousing
Copyright By PowCoder代写 加微信 powcoder
By the end of this class you should be able to:
• Articulate the differences between transactional (operational)
and informational (dimensional) databases
• Explain the characteristics of a DW
• Understand and explain the overall architecture of a DW
• Design Star Schemas
INFO20003 Database Systems © University of Melbourne
Part 1: Introduction to Data Warehousing
What is the most effective distribution
What product prom- -otions have the biggest
impact on revenue?
Who are my customers and what products
are they buying?
Which customers are most likely to go to the competition ?
INFO20003 Database Systems
© University of Melbourne
Motivations: A manager wants to know….
Which are our lowest/highest margin customers ?
What impact will new products/services have on revenue
and margins?
Relational Databases for Operational Processing
• Usedtorundaytodaybusinessoperations
• Automationofroutinebusinessprocesses
– Accounting – Inventory
– Purchasing – Sales
• Created huge efficiencies
OLTP Database
OLTP Database
INFO20003 Database Systems © University of Melbourne
Databases are great, BUT for business…
• Toomanyofthem
– Everybody wanted one, or two, or more
– Production, Marketing, Sales, Accounting …
• Everybodygotwhatwasbestforthem
– IBM, Oracle, Access, Microsoft
• Eventuallythisre-createdtheproblemdatabases were meant to solve
– Duplicated data – Inaccessible data – Inconsistent data
But data is useful for analysis and decision making
INFO20003 Database Systems © University of Melbourne
What can we do about it?
• NeedanintegratedwayofgettingtheENTIRE organisational data
• ItsreallyanInformationalDatabase,ratherthana Transactional Database
– A single database that allows all of the organisations data to be stored in a form that can be used to support organisational decision processes
INFO20003 Database Systems © University of Melbourne
Warehouse: An Informational Database
• Data Warehouse:
– A single repository of organisational data
– Integrates data from multiple sources
• Extracts data from source systems, transforms, loads into the warehouse
– Makes data available to managers/users – Supports analysis and decision-making
• Involvealargedatastore(oftenseveralTerabytes, Petabytes of data)
INFO20003 Database Systems © University of Melbourne
Transactional (Operational) Questions
Customer Service: Help! I forgot my membership card!
Select membership_nbr from MEMBER_INDEX where phone_num = ‘555-1212’
INFO20003 Database Systems © University of Melbourne
Analytical Questions
48,204,709
Campaign Management:
How many customers purchased more than $500 worth of alcohol in our Melbourne stores this year?
INFO20003 Database Systems © University of Melbourne
DW Supports Analytical Queries
• Oneisinterestedinnumericalaggregations – How many?
– What is the average?
– What is the total cost?
• Oneisinterestedinunderstandingdimensions – Sales by state by customer type
– Sales by product by store by quarter
DW will help answer these questions
INFO20003 Database Systems © University of Melbourne
Characteristics of a DW
• Subjectoriented
– Data warehouses are organised around particular subjects (sales, customers, products)
• Validated,Integrateddata
– Data from different systems converted to a common format: allows comparison and consolidation of data from different sources
– Data from various sources validated before storing it in a data warehouse
INFO20003 Database Systems © University of Melbourne
Characteristics of a DW
• Time variant
– Historical data
– Trend analysis crucial for decision support: requires historical data
– Data consists of a series of “snapshots” which are time stamped
• Non-volatile
– Users have read access only – all updating done automatically by ETL process and periodically by a DBA
INFO20003 Database Systems © University of Melbourne
A DW Architecture
Source Data Systems
Data Staging Area
Data & Meta-Data Storage Area
Analytics and Reporting
CRM Sales Production HR Finance
Stock Market Currency ABS
Processing
Match Combine De-duplication Standardise Transform Export
Enterprise Data Warehouse
Ad hoc querying
Dashboards
Scorecards
Report writers
End-user applications
Modelling tools Data mining tools Visualisation tools
INFO20003 Database Systems © University of Melbourne
Business Intelligence Dashboard
INFO20003 Database Systems © University of Melbourne
DW Supports Advanced Analytics
http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth http://pypr.sourceforge.net/kmeans.html
INFO20003 Database Systems © University of Melbourne
Part 2- Dimensional Modelling
Business Analyst World
• How much revenue did the product G generate in the last three months, broken down by month for
the south eastern sales region, by individual stores, broken down by promotions, compared to estimates
and to the previous version of the product
– Analysis starts usually with a single indication of something strange, then goes deep into the data, left to a new dimension, right to another, up to the summary, back down and left and right again, until the problem is identified…
– Dimensional Analysis: To support business analysts view
• Revenue per product per customer per location?
Fact Dimension Dimension Dimension
INFO20003 Database Systems © University of Melbourne
Introduction to Dimensional Modelling
• PopularisedbyRalphKimballinthe1990s
• Basedonthemulti-dimensionalmodelofdataand designed for retrieval-only databases
• Verysimple,intuitive,andeasily-understoodstructure
• Alsoknownasstarschemadesign
INFO20003 Database Systems © University of Melbourne
Dimensional Modelling
• A dimensional model consists of:
– Fact table
– Several dimensional tables
– (Sometimes) hierarchies in the dimensions
• Essentially a simple and restricted type of ER model
INFO20003 Database Systems © University of Melbourne
Fact Table
• A fact table contains the actual business measures (additive, aggregates), called facts
• The fact table also contains foreign keys pointing to dimensions
Time key Store key Customer key Product key Dollar sales Unit sales
INFO20003 Database Systems © University of Melbourne
Fact Table – example
• Actual data might look like this
• Granularity, or level of detail, is a key issue
– Finest level of detail for a fact table, determined by the finest level of each dimension
Dollar sales
Unit Sales
INFO20003 Database Systems © University of Melbourne
Star schema – dimensional model
Customer key Name Customer type
Time key Store key Customer key Product key Dollar sales Unit sales
Dimension 1 Fact
Product key Product type weight
Dimension 4
Dimension 3
Dimension 2
Store key Address Region
Time key Day Month
INFO20003 Database Systems © University of Melbourne
Dimension Hierarchies
Product key Product name Product type Product group Product sub-group weight
Time key Store key Customer key Product key Dollar sales Unit sales
Product name e.g. Hammer – Product type e.g. Tool
– Product group e.g. Hardware
INFO20003 Database Systems © University of Melbourne
Dimension Table – example
• Capturesafactorbywhichafactcanbedescribedor classified
• Actual data might look like this • Hierarchy evident in data
Prod-Group
Prod- Subgroup
10cm Nails
Plastic Pipe
INFO20003 Database Systems © University of Melbourne
Dimensional model as an ER model
Star schema
Customer Customerkey PK
Name Customer type
Product Product key PK
Product name Product type Product group Product sub-group weight
Sale Time key
Store key Customer key Product key Dollar sales Unit sales
Store key PK
Address Region
Time key PK
Fact table is an intersection table
INFO20003 Database Systems © University of Melbourne
Designing a Dimensional Model
1. Choose a Business Process
2. Choose the measured facts (usually numeric, additive quantities)
3. Choose the granularity of the fact table
4. Choose the dimensions
5. Complete the dimension tables
(Kimball, 1996)
INFO20003 Database Systems © University of Melbourne
Embedded Hierarchies in Dimensional Tables
Customer-id
Customer name Market segment Market sector Industry class Industry sector Industry group City
State Country
INFO20003 Database Systems © University of Melbourne
Embedded Hierarchies in Dimensional Tables
Market segment hierarchy
Industry hierarchy
Customer-id
Customer name Market segment Market sector Industry class Industry sector Industry group City
State Country
Market Sector
Market Segment
Industry Group
Industry Sector
Industry Class
Location hierarchy
INFO20003 Database Systems © University of Melbourne
Snowflake Schema: hierarchy in dimensions
Product Category
Product Source
Financial Year
Process Type
Market Sector
Market Segment
Store Layout
Store Type
Industry Group
Industry Sector
Industry Class
Sales (Fact Table)
Retail Outlet
State Country
Country State
Marketing Region
Marketing Division
Financial Quarter
Holiday Season
Mnfg Group
INFO20003 Database Systems © University of Melbourne
Design Outcomes: Normalised or Denormalised?
• Normalisation
– Eliminates redundancy – Storage efficiency
– Referential Integrity
• Denormalisation
– Fewer tables (fewer joins)
– Fast querying
– Design is tuned for end-user analysis
INFO20003 Database Systems © University of Melbourne
Let’s try it …
• We are making a data warehouse for a real estate agency. The company wants to track information about the selling of their properties. This warehouse keeps information about the agents (license#, first name, last name, phone #), buyers that come in (buyer id, first name, last name, phone #), and property (property#, property address, price). The information managers want to be able to find is the number of times a property is viewed, sales price. The information needs to be accessible by rental agent, by buyer, by property and for different time (day, week, month, quarter and year).
• Draw a star schema to support the design of this data warehouse.
INFO20003 Database Systems © University of Melbourne
What is Examinable?
• Differencesbetweentransactionalandinformational databases
• Designingastarschema
• Defining facts and dimension tables
INFO20003 Database Systems © University of Melbourne
If you want to know more
More technical details (I won’t ask you these things):
https://www.youtube.com/watch?v=w- S0fj0fmqg&list=PLdQddgMBv5zHcEN9RrhADq3CBColhY2hl&index=17
INFO20003 Database Systems © University of Melbourne
Next Lecture
• Distributed Databases
INFO20003 Database Systems © University of Melbourne
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com