ROYAL HOLLOWAY
Page 1 of 12
UNIVERSITY OF LONDON
MSc EXAMINATION 2014
For Internal Students of
Royal Holloway
DO NOT TURN OVER UNTIL TOLD TO BEGIN
BI5502: Business Intelligence
Sample Exam paper
ANSWERS
Time Allowed: TWO hours
Attempt THREE questions
No credit will be given for attempting any further questions
Calculators are NOT permitted
BI5502
© Royal Holloway University of London 2014
BI5502
Page 2 of 12 NEXT PAGE
BLANK PAGE
BI5502
Page 3 of 12 NEXT PAGE
Question 1
1. (a) Indicate whether each of the following statements is true or false.
Marking Scheme: 2 marks per correct answer
(i) Dimension hierarchies relate to drill-down analysis. [TRUE]
(ii) Type 1 changes for slowly changing dimensions relate to correction of errors.
[TRUE]
(iii) The main reason BI solutions are necessary is to support the day to day operational
needs of an organization. [FALSE]
(iv) Conforming dimensions is not absolutely necessary in a data warehouse. [FALSE]
(v) Scorecards display information on organizational Key Performance Indicators
(KPIs). [TRUE]
(vi) In dashboard design a flow based structure emphasizes a sequence of events or
actions across time. [TRUE]
[12 marks]
(b) The following cube shows the sales of different products of a bank, per branch, per
quarter of the year.
If you know that branches with codes 100 and 101 belong to Region A and branches with
codes 102 and 103 belong to Region B, perform the following OLAP operations in the
previous cube and design the results of those operations.
(i) Roll up on location (from branches to regions) [6 marks]
(ii) Slice, for Time = “Q1” [6 marks]
In the results of both (i) and (ii) above you may leave blank the parts whose values can
not be illustrated due to the lack of the necessary data in the above cube
Answer
i. The roll-up operation on location (from branches to regions results in the following
cube:
Marking Scheme: 4 marks for cube design, 2 marks for correct data.
BI5502
Page 4 of 12 NEXT PAGE
ii. The slice for Time =´Q1´ operation results in the following:
Marking Scheme: 4 marks for result, 2 marks for correct data.
Location
(Branches)
100 200 180 190 210
101 250
102 350
103 230
Mortgages Consumer Loan
Product
Credit Card Deposit
(c) Consider Customer Relationship Management (CRM) technology and applications.
(i) Name the functions/business units of an organization that typically use CRM applications.
[4 marks]
Answer: CRM mainly applies to the Management, Sales, Marketing and Customer
Service functions of an organization [1 mark per each function]
(ii) A company is interested in using a CRM application in order to send, via e-mail, a
promotional message to a group of customers. Which module of the CRM application
will be used for that purpose? [5 marks]
Answer: The Campaign Management module of a CRM application will be used for the
functionality described.
BI5502
Page 5 of 12 NEXT PAGE
Question 2
2 (a) Discuss the main differences between BI Solutions/Data Warehouses and On-Line
Transaction Processing Systems (OLTP). Structure the discussion by comparing the two
approaches according to the following criteria: Purpose, Usage, Data Modifications, DB
Schema Design, Typical DB Operations, Historical Data and DB Size. [13 marks]
Answer
Marking Scheme: 2 marks for the first six criteria, 1 mark for DB Size.
BI5502
Page 6 of 12 NEXT PAGE
(b) In the table below, the first column contains a list of business domains in which Data
Mining is applied. Complete the second column of the table with at least two data mining
examples as applied in these domains. The first entry has been already completed for you
as a sample answer. [6 marks]
Domain Example Data Mining Application
Financial Services
-Credit Scoring/Risk Analysis
-Fraud Detection
Customer Relationship
Management
Marketing campaigns
Churn analysis
Cross- or up-selling
Retail/Logistics
Optimize inventory levels
Optimize logistics by predicting seasonal effects
Manufacturing
Predict/prevent machinery failures
Optimize manufacturing capacity
Discover novel patterns to improve product quality
Criteria Transactional Systems BI Systems (Data Warehouse based)
Purpose Perform day-to-day business operations Support decision making
Usage OLTP systems support predefined
operations (i.e. have a fixed, “a-priori”
known set of operations in the data) and
their usage is repetitive in nature.
Designed for ad hoc querying and analysis.
Should be optimized to perform well for a
wide variety of such (not predefined)
operations.
Data
Modifications
The operations that end-users perform
result in direct data modifications.
Data warehouse/BI data is updated on a
regular basis using massive data
modification techniques (ETL). The end-
users of a data warehouse/BI infrastructure
do not directly update data, in other words
data is “read only” (non-volatile) from a
user perspective.
DB
Schema(s)
Design
Normalized schemas in order to optimize
the efficiency/consistency of
“update/insert/delete” data operations.
Typically “function” (transaction)
oriented schemas.
Combinations of relational/normalized and
multi-dimensional DB schemas (e.g. a star
schema) that optimize (ad-hoc) querying
and data analysis. Designed using
“subject” oriented schemas which directly
reflect business domain entities. Aggregate
data from different sources and maintain
them in consistent formats, i.e. are
integrated.
Typical DB
operations
DB queries access only some records DB queries usually scan thousands or
millions of rows
Historical
Data
Usually maintain data for only a few
weeks or months.
Maintain years of historical data (e.g. in
order to support historical analysis), i.e. are
time-variant
Size GBs TBs
BI5502
Page 7 of 12 NEXT PAGE
(c) A bank’s Data Warehouse contains data concerning the number of customers’ loans
applications performed in the bank’s branches in 3 different geographical regions (A, B,
C) in the 4 quarters of the year (Q1, Q2, Q3, Q4). A sample of the relevant business data
is summarized in the following 3 tables:
Region A
Q1 Q2 Q3 Q4
Mortgages Applications 1100 1200 975 1230
Credit Cards Applications 4125 3835 2746 3010
Consumer Loan Applications 3870 4010 3310 3425
Region B
Q1 Q2 Q3 Q4
Mortgages Applications 456 524 410 428
Credit Cards Applications 748 800 735 801
Consumer Loan Applications 945 875 1011 978
Region C
Q1 Q2 Q3 Q4
Mortgages Applications 223 297 289 300
Credit Cards Applications 330 300 333 377
Consumer Loan Applications 452 508 504 589
(i) Design a suitable Star-schema to satisfy the above data modelling requirements,
showing the schema entities, attributes and respective keys.
[7 marks]
(ii) Design a suitable Cube that represents the previous data model. The cube you design
should also illustrate in its visible parts the corresponding data from the above tables.
[8 marks]
Answer
Marking Scheme: 1 mark per correct entity, 1 mark per relationship
BI5502
Page 8 of 12 NEXT PAGE
Marking Scheme: 6 marks for correct cube design, 2 marks for correct data.
BI5502
Page 9 of 12 NEXT PAGE
Question 3
3. (a) Design a STAR schema for a chain of DVD rental stores whose managers would like to
be informed on the number and revenue of individual DVD items, per store, per day.
Devise at least two attributes for each of the non-key dimensions you include in your
schema; such attributes should make sense. Provide also a sample row of all tables of the
proposed schema illustrating the respective data interrelationships.
[13 marks]
Answer
Marking scheme: 9 marks for schema design correctness and completeness, 4 marks
for sample row.
Sample Row of Tables
Date
Date_Key Date Description Day of Week Month Year
1 21/02/2014 February 21st,
2014
Friday February 2014
Store
Store_ Key Zip Code City Country Region
1 322 34 London UK Europe
DVD Item
DVDItem_
Key
ItemCode Name Description Category
1 ABC123 “The Matrix” Film description Science Fiction
DVD Sales Facts
Store_Key Date_Key DVDItem_Key No of Sales Revenue
1 1 1 5 £15
BI5502
Page 10 of 12 NEXT PAGE
(b) In the context of Data Warehousing alternative architectures, provide a diagram of the
Centralized Data Warehouse architecture. What is the main issue of such architecture?
[9 marks]
Answer
Marking scheme: 6 marks for diagram, 2 marks for main issue
Source
Systems
Staging
Area
Normalized relational
warehouse (atomic/some
summarized data)
End user
access and
applications
ETL
(d) Centralized Data Warehouse Architecture
The main issue of the above architecture is the difficulty to achieve/build it.
(c) For each of the transformation types shown in the table below, give an example to
illustrate the case. One entry has already been completed for you as a sample answer.
Marking Scheme: 3 marks per row
Transformation Type Example
Character set conversion . from older character set formats to ASCII
Conversion of Units of
Measurements
Conversion of Miles to Km (or vice versa)
Splitting of Single Fields For example, when the first name, middle initials, and last
name of a customer is stored as a large text in a single field
and ones needs to store individual components of names and
addresses in separate fields in the data warehouse.
Date/Time Conversion e.g. Conversion between the US and the British formats
Decoding of Fields e.g. one source system using 1 and 2 for male and female and
another system using M and F.
[12 marks]
BI5502
Page 11 of 12 NEXT PAGE
Question 4
4. (a) What is the notion of granularity in dimensional modelling? How does it impact the
design of a BI solution/Data Warehouse? [10 marks]
Answer
Granularity or grain is the level of detail that a fact/measurement (a row in the facts table)
represents, i.e. its “scope”. For example, the “total sales per product, per store, per day” is
of different granularity compared to the “total sales per product type, per store, per
week”. [5 marks]
Its choice impacts the scalability of a BI solution due to the associated number of facts –
the lower the grain the larger the number of facts. Moreover, it also impacts the level of
detail that the users of the BI solution can “navigate” to. The lower the granularity, the
more level of detail is available to the users. [5 marks]
(b) Provide a diagram of the typical high-level components of a DB reporting solution and
outline the role of each of such components. [10 marks]
Answer
Marking Scheme: diagram 8 marks (2 per component), 2 marks for descriptions.
(c) Suppose that a dimensional model consists of the three dimensions date, doctor, and
patient, and a measure “charge”, which is the fee that a doctor charges a patient for a
visit.
(i) Draw a STAR schema for the above. [7 marks]
(ii) In a cube that shows data for a given patient, in a given doctor in a given date, what
specific OLAP operations should be performed in order to list the total fee collected by
each doctor in 2013? [7 marks]
Answer
Page 12 of 12
(i) Marking Scheme: 1 mark per entity, 1 mark per relationship
(ii) “Roll-up on date from day to year” [2 marks], then “Slice for time=2013” [2 marks]
and then “Roll-up on patient from individual patient to all”. [3 marks]
END