PowerPoint Presentation
Week 3a – Bridge Tables
Semester 2, 2021
FIT3003 – Business Intelligence
and Data Warehousing
Developed by:
Dr. Agnes Haryanto
Agnes.
MONASH
INFORMATION
TECHNOLOGY
Agenda
1. Bridge Tables
1. Product Sales Case Study
2. Truck Delivery Case Study
Using FLUX
1. Visit http://flux.qa/ on your internet enabled device
2. Log in using your Monash account (not required if
you are already logged in to Monash)
3. Click on the “+” to join audience
4. Enter the Audience Code:
• Clayton – Z9J7MT
• Malaysia – PMYBD6
5. Select FIT3003 in the Active Presentation menu
http://flux.qa/1AW6N8
Bridge Tables
Bridge Tables
▪ A bridge table is a table that links between two dimensions; and only one
of these two dimensions are linked to the fact.
➢ As a result, the star schema becomes a snowflake schema.
Bridge Tables
▪ Two reasons on why a dimension cannot be connected directly to the Fact:
▪
a) The Fact table has a fact measure, and the dimension has a key identity. In
order to connect a dimension to the Fact, the dimension’s key identity must
contribute directly to the calculation of the fact measure. Unfortunately, this
cannot happen if the operational database does not have this data.
b) The operational database does not have this data if the relationship between
two entities in the operational database that hold the information about
dimension’s key identity and the intended fact measure is a many-many
relationship.
Bridge Tables
Case Study #1
Case Study #1 – A Product Sales Case Study
▪ A company management team would like to
analyze the statistics of its product sales
history. The analysis is needed to identify
popular products, suppliers supplying those
products, the best time to purchase more
stock, etc.
▪ A small data warehouse is to be built to keep
track of the statistics.
▪ The management is particularly interested in
analyzing the total sales (quantity * price) by
product, customer suburbs, sales time
periods (month and year), and supplier.
Case Study #1 – A Product Sales Case Study
▪ The management is particularly interested in
analyzing the total sales (quantity * price) by
product, customer suburbs, sales time
periods (month and year), and supplier.
Case Study #1 – A Product Sales Case Study
▪ The management is particularly interested in
analyzing the total sales (quantity * price) by
product, customer suburbs, sales time
periods (month and year), and supplier.
▪ Sales Star Schema
➢ Fact:
• Total Sales
➢ Dimensions:
• Product
• Customer locations/suburbs
• Time period
• Supplier
▪ Possible Two-Column Methodology Tables:
Case Study #1 – A Product Sales Case Study
ProductNo TotalSales
A1 $130,000
B2 $15,900
C3 $2,500,000
… …
TimeID TotalSales
201801 $25,000
201802 $4,700
201803 $3,500
… …
Suburb TotalSales
Caulfield $6,500
Chadstone $12,000
Clayton $1,800
… …
(a) Product point of view (b) Time point of view (c) Suburb point of view
Case Study #1 – A Product Sales Case Study
▪ Sales Star Schema
➢ Fact:
• Total Sales
➢ Dimensions:
• Product
• Customer locations/suburbs
• Time period
• Supplier
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
SupplierID TotalSales
S1 $77,000
S2 $5,700
S3 $12,500
… …
Supplier point of view
Case Study #1 – A Product Sales Case Study
SupplierID TotalSales
S1 $77,000
S2 $5,700
S3 $12,500
… …
Supplier point of view
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
SupplierID TotalSales
S1 $77,000
S2 $5,700
S3 $12,500
… …
Case Study #1 – A Product Sales Case Study
Case Study #1 – A Product Sales Case Study
Bridge Table
▪ To create Time Dimension:
– create table TimeDim as
select
distinct to_char(SalesDate, ’YYYYMM’) as TimeID,
to_char(SalesDate, ’YYYY’) as Year,
to_char(SalesDate, ’MM’) as Month
from Sales;
▪ To create Customer Location Dimension:
– create table CustLocDim as
select distinct Suburb, Postcode
from Customer;
Case Study #1 – A Product Sales Case Study
▪ To create Product Dimension:
– create table ProductDim as
select distinct ProductNo, ProductName
from Product;
▪ To create Bridge Table:
– create table ProductSupplierBridge as
select *
from StockSupplier;
▪ To create Supplier Dimension:
– create table SupplierDim as
select SupplierID, Name as SupplierName
from Supplier;
Case Study #1 – A Product Sales Case Study
▪ To create Fact Table:
– create table ProductSalesFact as
Select
to_char(S.SalesDate, ’YYYYMM’) as TimeID,
P.ProductNo,
C.Suburb,
sum(SI.QtySold*P.Price) as TotalSales
from Sales S, Product P, Customer C, SalesItem SI
where S.SalesNo = SI.SalesNo
and SI.ProductNo= P.ProductNo
and C.CustomerID = S.CustomerID
group by
to_char(S.SalesDate, ’YYYYMM’), P.ProductNo, C.Suburb;
Case Study #1 – A Product Sales Case Study
Bridge Tables
Case Study #2
Case Study #2 – A Truck Delivery Case Study
▪ A trucking company is responsible for picking up goods from warehouses of a retail chain
company, and delivering the goods to individual retail stores.
▪ A truck carry goods during a single trip, which is
identified by TripID, and delivers these goods to
multiple stores. Trucks have different capacities
for both the volumes they can hold and the
weights they can carry.
▪ At the moment, a truck makes several trips each
week. An operational database is being used to
keep track the deliveries, including the scheduling
of trucks, which provide timely deliveries to
stores.
Case Study #2 – A Truck Delivery Case Study
▪ A trip may pick up goods from many
warehouses
o i.e. a many-many relationship between
Warehouse and Trip
▪ A trip uses one truck only, and a truck may
have many trips in the history
o i.e. a many-1 relationship between Trip and
Truck
▪ A trip delivers goods (e.g. TVs, fridges, etc)
potentially to several stores
o a many-many relationship between Trip and
Store, which is represented by the Destination
table
▪ Sample data in the operational database:
Case Study #2 – A Truck Delivery Case Study
WarehouseID Location
W1 Warehouse1
W2 Warehouse1
W3 Warehouse1
… …
(d) Truck Table
(b) Trip Table (c) TripFrom Table
TripID Date TotalKm TruckID
Trip1 14-Apr-2018 370 Truck1
Trip2 14-Apr-2018 570 Truck2
Trip3 14-Apr-2018 250 Truck3
Trip4 15-Jul-2018 450 Truck1
… … … …
TripID WarehouseID
Trip1 W1
Trip1 W2
Trip1 W3
Trip2 W1
Trip2 W2
… …
TruckID VolCapacity WeightCategory CostPerKm
Truck1 250 Medium $1.20
Truck2 300 Medium $1.50
Truck3 100 Small $0.80
Truck4 550 Large $2.30
Truck5 650 Large $2.50
… … … …
StoreID StoreName Address
M1 MyStore City Melbourne
M2 MyStore Chaddy Chadstone
M3 MyStore HiPoint High Point
M4 MyStore Donc Doncaster
M5 MyStore North Northland
M6 MyStore South Southland
M7 MyStore East Eastland
M8 MyStore Knox Knox
… …
TripID StoreID
Trip1 M1
Trip1 M2
Trip1 M4
Trip1 M3
Trip1 M8
Trip2 M4
Trip2 M1
Trip2 M2
… …
(a) Warehouse Table
(e) Store Table (f) Destination Table
Case Study #2 – A Truck Delivery Case Study
▪ The management of this trucking company would like to analyze the deliver cost, based on
trucks, time period, and store.
Case Study #2 – A Truck Delivery Case Study
▪ Sales Star Schema
➢ Fact:
• Total Delivery Cost
(distance * cost per kilometre)
➢ Dimensions:
• Truck
• Time period
• Store
Case Study #2 – A Truck Delivery Case Study
▪ From the Truck point of view, Truck1 has two trips (e.g. Trip1 and Trip4), with the total kilometres of
820km (370km + 450km). The cost for Truck1 is $1.20. Hence, calculating the cost for Truck1 is
straightforward. Other trucks can be calculated this way.
▪ From the Period point of view, 14-Apr-2018 has three trips (e.g. Trip1,Trip2, and Trip3). Trip1 (370km) is
delivered by Truck1 which costs $1.20/km. Trip2 and Trip 3, on the same day, can be calculated the same
way. Hence, on 14-Apr-2018, the total cost can be calculated.
▪ From the Store point of view; The cost is calculated
based on Trip, but a trip delivers goods to many
stores. Therefore, the delivery cost for each store
cannot be calculated. The delivery cost is for the trip –
not for the store.
Case Study #2 – A Truck Delivery Case Study
Solution Model 1 – Using a Bridge Table
Case Study #2 – A Truck Delivery Case Study
Solution Model 2 – add a Weight Factor attribute
A weight factor is only needed if we want to estimate
the contribution that a dimension made to the fact
Case Study #2 – A Truck Delivery Case Study
StoreID StoreName Address
M1 MyStore City Melbourne
M2 MyStore Chaddy Chadstone
M3 MyStore HiPoint High Point
M4 MyStore Donc Doncaster
M5 MyStore North Northland
M6 MyStore South Southland
M7 MyStore East Eastland
M8 MyStore Knox Knox
… …
TripID StoreID
Trip1 M1
Trip1 M2
Trip1 M4
Trip1 M3
Trip1 M8
Trip2 M4
Trip2 M1
Trip2 M2
… …
(c) Store Table(b) Bridge Table
(a) Trip Dimension Table
▪ To create Trip Dimension:
– create table TripDim2 as
select T.TripID, T.TripDate, T.TotalKm,
1.0/count(*) as WeightFactor
from Trip T, Destination D
where T.TripID = D.TripID
group by T.tripid, T.tripdate, T.totalkm;
Case Study #2 – A Truck Delivery Case Study
Case Study #2 – A Truck Delivery Case Study
Solution Model 3 – a List Aggregate version
Case Study #2 – A Truck Delivery Case Study
TripID StoreID
Trip1 M1
Trip1 M2
Trip1 M4
Trip1 M3
Trip1 M8
Trip2 M4
Trip2 M1
Trip2 M2
… …
(b) Bridge Table
(a) Trip Dimension Table
listagg (Attr1, ‘_’) within group
(order by Attr1) as ColumnName
▪ To create Trip Dimension:
– create table TripDim3 as
select T.TripID, T.TripDate, T.TotalKm,
1.0/count(D.StoreID) as WeightFactor,
listagg (D.StoreID, ’_’) within group
(order by D.StoreID) as StoreGroupList
from Trip T, Destination D
where T.TripID = D.TripID
group by T.TripID, T.TripDate, T.TotalKm;
Case Study #2 – A Truck Delivery Case Study
▪ Joining based on the StoreGroupList attribute in the Trip dimension table
and the StoreID in the Store dimension table:
– select *
from TripDim3 T, StoreDim3 S
where T.StoreGroupList like ’%’||S.StoreID||’%’;
▪ Without the StoreGroupList attribute in the Trip dimension, we need to join
three tables:
– select *
from TripDim3 T, BridgeTable3 B, StoreDim3 S
where T.TripID = B.TripID
and B.StoreID = S.StoreID;
Case Study #2 – A Truck Delivery Case Study
Bridge Tables
Summary
Bridge Tables – Summary
▪ In principal, a Bridge Table is used:
▪
a) When it is impossible to have a dimension connected directed to the Fact table,
because simply there is no relationship between this dimension and the Fact table
(e.g. in the Product Sales case study, it is impossible to have a direct link from
SupplierDim to ProductSalesFact)
b) When an entity (which will become a dimension) has a many-many relationship with
another entity (dimension) in the E/R schema of the operational database (e.g.
Supplier and Stock has a many-many relationship).
c) When temporality aspect (data history) is maintained in the operational database
and the bridge table can be used to accommodate the dimension that has temporal
attributes (e.g. product supply history is maintained in the second snowflake schema
example).
Bridge Tables – Summary
▪ When a Bridge Table is used in the schema, there are two additional options:
▪
a) A Weight Factor is used to estimate the contribution of a dimension in the calculation
of the fact measure. Because this is only an estimate, a weight factor is option.
b) Every snowflake schema (whether it has Weight Factor or not) can be implemented
in two ways: a List Aggregate version, and a non-List Aggregate version.