Office Use Only
EXAM CODES: TITLE OF PAPER: EXAM DURATION:
Semester 1 2021 Examination Period
Faculty of Information Technology
FIT5195
Business Intelligence and Data Warehousing – SAMPLE 1
2 hours 10 minutes or 130 minutes
THIS PAPER IS FOR STUDENTS STUDYING AT: (tick where applicable)
Caulfield Clayton Parkville Peninsula
Monash Extension Off Campus Learning Malaysia Other (specify)
Sth Africa
During an exam, you must not have in your possession any item/material that has not been authorised for your exam. This includes books, notes, paper, electronic device/s, mobile phone, smart watch/device, calculator, pencil case, or writing on any part of your body. Any authorised items are listed below. Items/materials on your desk, chair, in your clothing or otherwise on your person will be deemed to be in your possession.
No examination materials are to be removed from the room. This includes retaining, copying, memorising or noting down content of exam material for personal use or to share with any other person by any means following your exam.
Failure to comply with the above instructions, or attempting to cheat or cheating in an exam is a discipline offence under Part 7 of the Monash University (Council) Regulations, or a breach of instructions under Part 3 of the Monash University (Academic Board) Regulations.
AUTHORISED MATERIALS
OPEN BOOK
CALCULATORS
SPECIFICALLY PERMITTED ITEMS if yes, items permitted are:
YES YES YES
NO NO NO
Candidates must complete this section if required to write answers within this paper
STUDENT ID: __ __ __ __ __ __ __ __ DESK NUMBER: __ __ __ __ __
Question 1:
Monash International would like to analyse their policy in regard to English requirement for admission into a course. Monash International has the following data:
Table: Student_IELTS
Table: Student_Course
Table: Student
A data warehouse based on the above data has been created. A star schema is shown as follows:
Student ID Student Name Listening Reading Writing Speaking Overall
228493 229094 231289 234354 234355 … …
Sooying Tan 6.5
Xuebing Lu 5.5 Amandh Kumar 6.0 Agus Hidayat 5.5 Budi Rahayu 7.0
6.5 6.0 7.0 6.5 5.5 5.5 5.5 5.5 7.0 6.0 7.0 6.5 6.0 6.0 6.5 6.0 7.0 7.0 7.0 7.0
Student ID Student Name Course StarYear
228493 229094 231289 234354 234355 … …
Sooying Tan Xuebing Lu Amandh Kumar Agus Hidayat Budi Rahayu
MBIS 2013 MBIS 2013 MIT 2013 MIT 2013 MIT 2013
Student ID
228493 229094 231289 234354 234355 … …
Student Name
Sooying Tan Xuebing Lu Amandh Kumar Agus Hidayat Budi Rahayu
Address
Suburb
Phone Number
Country
Singapore China India Indonesia Indonesia
Questions:
a. The above star schema will not produce a correct analysis of the fact measures. Explain
why. Explain your answer using more concrete examples or data. (5 marks)
b. How do you correct this problem by changing the fact measures of the above star schema? Explain your solution using more concrete examples or data. (5 marks)
Question 2:
Data cleaning is an important part in building a clean and correct data warehouse. Data cleaning is often needed, because there are mistakes and inconsistencies in the operational database. Before data cleaning is done, we need to do data exploration on the operational database in order to find out if there are any mistakes and inconsistencies in the operational database.
The following are the four tables in the operational database:
SQL> desc dw.uselog;
Name —————————————– LOG_DATE
LOG_TIME
STUDENT_ID
ACT
SQL> desc dw.student;
Name —————————————– SEX
FULL_PART
TYPE
CLASS_ID
MAJOR_CODE
STUDENT_ID
SQL> desc dw.class;
Name —————————————– CLASS_DESCRIPTION
CLASS_ID
SQL> desc dw.major;
Name —————————————– MAJOR_NAME
MAJOR_CODE
Null? Type
——– —————–
NOT NULL DATE
NOT NULL DATE
NOT NULL CHAR(11)
CHAR(1)
Null? Type
——–
—————–
VARCHAR2(2)
VARCHAR2(2)
VARCHAR2(4)
VARCHAR2(6)
VARCHAR2(8)
NOT NULL CHAR(11)
Null? Type
——–
—————–
CHAR(50)
VARCHAR2(6)
Null? Type
——–
—————–
CHAR(35)
VARCHAR2(8)
Questions:
(a) Write the SQL command to find out if there are duplicate student records (5 marks)
(b) Write the SQL command to find out if there are records in dw.uselog whereby the Student_ID exists in
dw.uselog actually do not exist in dw.student (5 marks)
Question 3:
Given the following schema:
The tables (e.g. ClothingCompany fact and the three dimensions) have been created and populated with an adequate number of records.
The table names and attributes are shown in the above star schema. In the Fact table, the total order quantity and total order cost attributes are included.
Write the SQL for the following advanced OLAP queries:
a) Perform a CUBE operation (use all dimensions). Display each TotalOrderCost and the subtotals. (3 marks)
b) Like question (a) above, but now perform a ROLLUP operation. (3 marks)
c) Perform a CUMMULATIVE SUM of the TotalOrderCost of all WEBSITE orders (use all dimensions). (4
marks)
d) Like question (c) above, perform a CUMMULATIVE SUM of the TotalOrderCost but PARTITIONED
based on the OSourceID, that is one partition for Phone orders, one partition for Fax orders, and one partition
for Website orders. (4 marks)
e) Show the total order costs of each source order, and RANK them. (3 marks)
f) Display the source order that generates the highest total order cost. (3 marks)
Question 4:
There is a tool way (or tool road) in a metropolitan city (such as CityLink or EastLink in Melbourne, or any similar tool roads in other major cities in the world). This tool way has a number of gates, where the motorist needs to pay. Every time a motorist passes through this tool gate, the registration number of the vehicle, vehicle type (e.g. car, bus, truck, etc), amount paid, and time, are recorded in the operational database.
A data warehouse needs to be built, for analysing the revenue from the toll payments. The management would like to drill down this revenue based on the tollgate (there is a number of toll gates along the toll way), day of week (e.g. weekdays, weekends), and time period of a day (e.g. peak hours, non-peak hours, late nights).
You are required to draw three levels of star schemas showing three different levels of aggregation for the above data warehouse. You also need to explain each of the three star schemas, by contrasting the level of aggregation. Level-0 star schema contains the most detailed data, whereas level-2 star schema is the highly aggregated (e.g. containing highly aggregated data).
Questions:
(a) Draw a level-2 star schema and explanation why it is a level-2 schema (6 marks)
(b) Draw a level-1 star schema and explain why it is a level-1 schema. You may want to add a new
dimension, called vehicle (e.g. cars, trucks, busses, etc). You need to also explain the difference
between level-1 and level-2 schemas. (6 marks)
(c) Draw a level-0 star schema and explain why it is a level-0 schema. You also need to explain the
difference between level-1 and level 0 schemas. (8 marks)
Question 5:
Monash University employs its students to do various jobs, such as tutoring, programming, etc. These jobs are called sessional jobs. For each sessional job, students need to sign a contract. For example, to do tutoring (one or more units), the student will sign a contract with Monash for one semester.
These sessional workers (e.g. sessional tutors) need to claim their work hours every week. This claim will need to be approved by a designated person in the faculty. Every fortnight, the sessional workers will get their pay.
The operational database, which keeps track of this system, is shown by the following E/R diagram.
You are required to build a data warehouse to analyse the following:
– The total number of contracts made every year.
– The total payment made to each employee in 2020.
– The yearly number of contracts made in each department.
Question:
(a) Draw the star schema for this system. (10 marks)
(b) Write the SQL queries to create the fact and dimension tables. (10 marks)
Question 6:
This question is about Top n% and Top k (such as Top 10% and Top 3) in OLAP. The tables are based on the ROBCOR data warehouse case study, which consists of one fact and three dimension tables: charter_fact, time, pilot, and model.
SQL> desc charter_fact; Name
TIME_ID MOD_CODE EMP_NUM TOT_CHAR_HOURS TOT_FUEL REVENUE
SQL> desc time; Name
TIME_ID TIME_YEAR TIME_MONTH
SQL> desc pilot; Name
EMP_NUM PIL_LICENSE PIL_RATINGS PIL_MED_TYPE PIL_MED_DATE PIL_PT135_DATE
SQL> desc model; Name
MOD_CODE MOD_MANUFACTURER MOD_NAME MOD_SEATS MOD_CHG_MILE MOD_CRUISE MOD_FUEL
Null?
Null?
Null?
Null?
Type
VARCHAR2(6) CHAR(10) NUMBER(10) NUMBER NUMBER NUMBER
Type
CHAR(6) CHAR(4) CHAR(2)
Type
NUMBER(10) CHAR(25) CHAR(25) CHAR(1) DATE
DATE Type
CHAR(10) CHAR(15) CHAR(20) FLOAT(126) NUMBER(19,4) FLOAT(126) FLOAT(126)
Questions:
a. Write the SQL command to display the time periods which had the revenue in the top 10% of the months.
The result should be like this:
TIME_ID TOTAL
199503 51144.16 199408 49775.51 199510 48538.01 199409 47647.75 199703 45872.32
PERCENT_RANK
1 .975609756 .951219512 .926829268 .902439024
b. Write the SQL command to display the mod_code and mod_name of the two airplanes that have the largest total fuel used.
The result should look like this:
MOD_CODE MOD_NAME
PA31-350 Navajo Chieftain C-90A KingAir
TOTAL MYRANK
83790.5 1 61708.4 2
Question 7:
Given the following star schema:
The tables (e.g. Fact and three dimensions) have been created and have also been populated with an adequate number of records. The table names and attributes are shown in the star schema above.
Write the SQL for the following OLAP queries:
a. Display the top 10 average prices by suburb of property
b. Display the average price of properties by property type description and suburb. It is
not required to show the subtotals or group totals or grand total
Question 8:
This question is based on the Robcor case study. The following is the E/R diagram of the operational database in the Robcor case study:
A star schema for the above operational database is shown as follows:
Robcor Star Schema-1
Questions:
a. Is it possible to determine which level Robcor Star Schema-1 is? If it is possible, state the level and also give the reason. If it is not possible to state the level, then give the reason.
b. Let’s have a look at the following star schema (Robcor Star Schema-2). Between the two star schemas (Robcor Star Schema-1 and Robcor Star Schema-2), which one has a higher level of aggregation? State the name of the star schema, whether it is Robcor Star Schema-1 or Robcor Star Schema-2, and explain the reason.
Robcor Star Schema-2
THE END