INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
INFS5710 IT Infra. for BA Database Concept
UNSW Business School
Copyright By PowCoder代写 加微信 powcoder
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
• There are some file-sharing websites that specialise in buying and selling academic work to and from university students.
• If you upload your original work to these websites, and if another student downloads
and presents it as their own either wholly or partially, you might be found guilty of collusion — even years after graduation.
• These file-sharing websites may also accept purchase of course materials, such as copies of lecture slides and tutorial handouts. By law, the copyright on course materials, developed by UNSW staff in the course of their employment, belongs to UNSW. It constitutes copyright infringement, if not academic misconduct, to trade these materials.
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
Acknowledgement of Country
UNSW Business School acknowledges the Bidjigal (Kensington campus) and Gadigal (City campus) the traditional custodians of the lands where each campus is located.
We acknowledge all Aboriginal and Islander Elders, past and present and their communities who have shared and practiced their teachings over thousands of years including business practices.
We recognise Aboriginal and Islander people’s ongoing leadership and contributions, including to business, education and industry.
UNSW Business School. (2021, July 24). Acknowledgement of Country [online video]. Retrieved from https://vimeo.com/369229957/d995d8087f
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
Database Systems Infrastructure
Entity Relationship Model (ERM)
Normalisation
Prepared by , Feb. 2021
External Data (e.g. Excel)
ETL (Data Cleansing)
Star Schema (De-Normalised)
Data Warehouse
External Data (e.g. Excel)
Data (DW to BD, or vice versa or both)
Unstructured Data (Social Media)
Data Streaming
Structured Data (Internet of Things (IOT))
Data Streaming
Flat Files
Machine Learning
Relational Database
The notes in the speaker notes originally are for myself only, but students saw my notes and they asked for them, so I start to share with you all. It is not perfect but hopefully, it makes sense to you!
For the next two weeks, we will be looking at relational database model, particularly Entity Relationship Modelling or ERM. or sometimes we just call it Entity Relationship Diagram or ERD.
This week we will be looking at relational table, and next week, we will be looking at normalisation of database tables. So, that is the red cylinder.
(“Not Normalised”)
Spark and NoSQL (and other tools)
Hadoop Distributed File System (HDFS) and MapReduce
Reporting (Business Intelligence and Visualisation) and Business Analysis (End Users)
Note: In-Memory Database (e.g. SAP Hana) is an alternative data model not shown here
INFS5710 Week 2
The Relational Database Model 3-1 to 3-7
We will not cover indexing in 3-8.
INFS5710 Week 2
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Let’s recap from last week…
Recap, you have three tables instead of one gigantic table. You will learn how to do this in ERD and Normalisation which will cover in the this week and next week. We will show how to create ERD and next week we will do normalisation to ensure tables and columns are well defined.
Just a quick recap if you forgot what these tables are about: Instead of having one gigantic table, we now have three smaller and different tables namely employee, skill, and certified. Now, if you want to find which employee who did the course Basic Database Manipulation, you can find all the employees much easier using these tables.
In the Skill table, basic Database Manipulation is Skill_Id 210. Now let’s go to Certified table, Skill_id is the middle column. For 210, we have two 06234 and 09382. Now, we can go to Employee table; you can see 06234 is and, 09382 is .
If you want an easier way, you will write SQL statements to link these tables together to get the results. This is something you can look forward to do in the lab.
INFS5710 Week 2
Characteristics of a Relational Table
A table is perceived as a two-dimensional structure composed of rows and columns. relation = table
Each table row (tuple) represents a single entity occurrence within the entity set. row = tuple = “entity”
Each table column represents an attribute, and each column has a distinct name. column = attribute
Each intersection of a row and column represents a single data value.
All values in a column must conform to the same data format.
Each column has a specific range of values known as the attribute domain.
The order of the rows and columns is immaterial to the DBMS.
Each table must have an attribute or combination of attributes that
uniquely identifies each row.
called “primary key”
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Okay, firstly, we look at a relational table. Let’s look at Table 3.1 it describes characteristics of a table. A table is like a 2-dimensional spreadsheet containing rows and columns. The sheet tab in Excel is like a table in relational database. The row is the tuple, and column is the attribute, and the cell contains the data.
For each column, like in Excel, it has to be the same type of data. For example, if a column is a date, then the whole column must be all dates. If a column is numeric, then the whole column must be numeric.
For the last point, number 8, each table must have an attribute or combination of attributes that uniquely identifies each row. We also called this Primary key, which we will discuss later.
INFS5710 Week 2
When you see a course and class rosters report, such as below, you know you need a few tables to link together.
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
This is actually the report I see on your class enrolment in My UNSW. This was taken on Sunday, 23rd February, 2020.
It has your course details, the term details, undergraduate or postgraduate, which faculty offers this course, how many students enrolled, which term. and how many units of credit. You can see there is a course id 064860, which is uniquely key identified for this course.
INFS5710 Week 2
Very Important
Consist of one or more attributes that determine other attributes
Used to:
Ensure that each row in a table is uniquely identifiable
Establish relationships among tables and to ensure the integrity of the data
Primary key (PK): Attribute or combination of attributes that uniquely identifies any given row
For example, STUDENTS table, the PK is SID; EMPLOYEES table, PK is employee ID.
A PK may contain more than one attribute.
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
A primary key uniquely identifies one or more columns in a row or record of a table. For example, student id or student number uniquely identifies you at the UNSW, or course id 064860 for INFS5710.
So, whenever you fill in a form at the university or login using your student id, the system knows it is you. Thus, student id is a primary key and uniquely identifies you and you only.
In a S table (say) in the university system, a column ZId will be the primary key of that table. If your student id is 1234567, and if someone tries to enter a new record with ZId 1234567, it will be rejected because it is already existed in the table. Other examples include your driving license, and your tax file number.
In the definition of primary key, it can be uniquely identified by one or more columns. This means you can have one, two, three, four or even more columns combine to form a primary key.
Did you notice that the course description is “IT Infrastructure for Analytics” whereas the course description is officially “Information Technology Infrastructure for Business Analytics”. Thus, nearly all the time you do not use description as a PK.
INFS5710 Week 2
Find the Primary Keys
These are the tables used in your enrolment in this class:
Please Note: Not all attributes are included in the tables, not all tables are included, and I changed a bit such as I used INFS5710 as the course id instead of 064860 for teaching purpose.
Table: Students
29/02/2000
Table: Courses
Course Name
Information Technology Infrastructure for Business Analytics
Table: Class Management
WeChat Account
1Good Student (INFS5710)
Table: Degrees
Table: Class Enrolment
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Please do not look at the answers on the next slide!
These are the tables used in your enrolment in this class!
Please Note that I did not include all attributes in the tables. I changed a bit such as I used INFS5710 as the course id instead of 064860 for teaching purpose.
Let start you off, the primary key for students table is the ZId because the ZId identifies you and you only.
How about Courses? …
Author, 23/02/2020
INFS5710 Week 2
Answer for the Primary Keys
Table: Class Enrolment
Table: Students
29/02/2000
Table: Class Management
WeChat Account
1Good Student (INFS5710)
Table: Courses
Course Name
Information Technology Infrastructure for Business Analytics
Table: Degrees
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The primary key for students table is the ZId because the ZId identifies you and you only. Course_Id is the primary key for Courses because Course_Id identifies one course only.
To enrol in a course, you need your student number (ZId), course you want to do (course_id), and in which term (term_id). Thus, the primary key in course_enrolment table consists of three columns, namely ZId, term_id, and course_id; sometimes this is called composite key. If you fail this course, then you have to this course again, then Term_Id will be 2020T3, and you will have different lab and lecture times.
As for Degrees table, it is actually ZID and Program because you can change a program Also, you need to take into accounts of undergraduate and postgraduate if you have done undergraduate previously.
The degrees table is simplified in our case.
As for Class Management table, again, your student number (ZId), course you want to do (course_id), and in which term (term_id) are together to form a composite key. As for class_id, it is an unique key or UK, it has to depend on the uniqueness of the composite key. I only use class_id in this class to manage your assessments, attendances, and assignments, and I also allow you to use the class id to find and sign up your group members. Please note
INFS5710 Week 2
I will call this a local table as the class management table only used by me and I delete all records at the end of the term, and one can use this table except me. We will come back to this in a few weeks time.
INFS5710 Week 2
Determination
STU_NUM STU_LNAME (determinant) (dependent)
Dependencies
State in which knowing the value of one attribute makes it possible to determine the value of another
Establishes the role of a key
Based on the relationships among the attributes
Functional dependence: value of one or more attributes determines the value of one or more other attributes
Determinant: attribute whose value determines another e.g., PK
Dependent: attribute whose value is determined by the other
Full functional dependence: entire collection of attributes in the determinant is necessary for the relationship
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Determination is the state in which knowing the value one attribute makes it possible to determine the value of another. The idea of determination is not unique to the database outright. For example, Profit = revenue minus costs. As long as you know the revenue and costs, you can determine the profit. Determination in a database environment, however, is not normally based on a formula but on the relationships among the attributes. For example, if I know the student number, I can find out the student name.
Functional dependence is when the value of one or more attributes determines The value of one or more other attributes. For example, student number and student name are shown here.
INFS5710 Week 2
Types of Keys (1)
(A, B) C
Composite key: Key that is composed of more than
one attribute
For example, the course enrolment table has ZId, course_id, and term_id is a composite key
Key attribute: Attribute that is a part of a key
Superkey: key that can uniquely identify any row in the table
A, B Candidate key: minimal superkey
Entity integrity: Condition in which each row in the
table has its own unique identity
All of the values in the primary key must be unique
No key attribute in the primary key can contain a null
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
We have already discussed a composite key, the example of course enrolment table has ZId, Course_Id, and term_id together to form a composite key.
Key attribute is when an attribute that is a part of a key.
A Superkey is a key that can uniquely identify any row in the table. For example, Course_Id in Courses table, ZID in Students table, and ZId, term_id, and course_id for course_enrolment table.
A candidate key is a minimal superkey. For example, in the Student table, your mobile number might able to identify who you are. Sometimes, when you go a shop, you might forget your rewards card, they might ask you for your mobile number.
Entity integrity is when a condition in which each row in the table has its own unique identity. All of the values in the primary key must be unique and no key attribute in the primary key can be null.
INFS5710 Week 2
Types of Keys (2)
Null: Absence of any data value that could represent: Anunknownattributevalue
Aknown,butmissing,attributevalue
Aninapplicablecondition
Referential integrity: Every reference to an entity instance by another entity instance is valid
Foreign key (FK): primary key of one table that has been placed into another table to create a common attribute
Secondary key: Key used strictly for data retrieval
(e.g., people do not remember their membership no. (PK),
the secondary key can be their name, which may not be unique.)
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
So, what is Null? Null is the absence of any data value. So what does it mean in English and how it is different from zero?
Referential integrity is when you want to ensure, for example, before you add a record with a ZID, you need to ensure the ZID exists in the students table.
A Foreign key (FK) is when you associate one attribute of an entity to an attribute of another entity.
Secondary key is the key used specifically for data retrieval.
INFS5710 Week 2
Figure 3.2 – An Example of a Simple Relational Database
used to link to some other table. A foreign key must be the PK of some other table.
this table is self-sufficient.
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
This is in the textbook, have a read through the example.
INFS5710 Week 2
Table 3.3 – Relational Database Keys
e.g., the set of all attributes
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Read in the textbook
INFS5710 Week 2
Integrity Rules
e.g., STUDENT.supervisor = “null” (not yet assigned)
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com