INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
INFS5710 IT Infra. for BA Database Concept
UNSW Business School
Copyright By PowCoder代写 加微信 powcoder
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
• There are some file-sharing websites that specialise in buying and selling academic work to and from university students.
• If you upload your original work to these websites, and if another student downloads
and presents it as their own either wholly or partially, you might be found guilty of collusion
— even years after graduation.
• These file-sharing websites may also accept purchase of course materials, such as copies of lecture slides and tutorial handouts. By law, the copyright on course materials, developed by UNSW staff in the course of their employment, belongs to UNSW. It constitutes copyright infringement, if not academic misconduct, to trade these materials.
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
Acknowledgement of Country
UNSW Business School acknowledges the Bidjigal (Kensington campus) and Gadigal (City campus) the traditional custodians of the lands where each campus is located.
We acknowledge all Aboriginal and Islander Elders, past and present and their communities who have shared and practiced their teachings over thousands of years including business practices.
We recognise Aboriginal and Islander people’s ongoing leadership and contributions, including to business, education and industry.
UNSW Business School. (2021, July 24). Acknowledgement of Country [online video]. Retrieved from https://vimeo.com/369229957/d995d8087f
INFS5710 Week 1
Copyright © 2012, SAS Institute Inc. All rights reserved.
Database Systems Infrastructure
Week 1 – Week 5 and Week 7
Entity Relationship Model (ERM)
Normalisation
Prepared by , Feb. 2021
External Data (e.g. Excel)
ETL (Data Cleansing)
Star Schema (De-Normalised)
Data Warehouse
External Data (e.g. Excel)
Data (DW to BD, or vice versa or both)
Unstructured Data (Social Media)
Data Streaming
Structured Data (Internet of Things (IOT))
Data Streaming
Flat Files
Machine Learning
Relational Database
The notes in the speaker notes originally are for myself only, but students saw my notes and they asked for them, so I start to share with you all. It is not perfect but hopefully, it makes sense to you!
This diagram covers what we are going to do in the next ten weeks: data infrastructure, data warehouse and big data. We cannot cover everything in this course, and we can only touch the surface of infrastructure as shown.
Data infrastructure is critical because this is how you will control how the data are captured and control flow of the data.
You will learn about normalisation design of the database using Entity-Relationship Diagram or ERD. In the lab, you will learn to draw and use Oracle to create an ERD.
In the lab, you will learn SQL or Sequel. You will use SAS because it can handle a large dataset without any issues.
We will touch on how you can design a Data Warehouse using Stars schema in Week 6.
(“Not Normalised”)
Spark and NoSQL (and other tools)
Note: In-Memory Database (e.g. SAP Hana) is an alternative data model not shown here
Hadoop Distributed File System (HDFS) and MapReduce
Reporting (Business Intelligence and Visualisation) and Business Analysis (End Users)
Week 8 – Week 10
INFS5710 Week 1
In the last few weeks, we will talk about big data system management and its infrastructure. We will briefly discuss Hadoop, MapReduce, and Sparks behind the building of the big data. If time permits, we will look at AWS as well.
INFS5710 Week 1
Chapter 1 Database Systems 1-1 to 1-7 (pp. 3 to 28)
We will follow the textbook very closely.
As you have paid over A$60 for the textbook, you might as well make full use of it. The reason we use a textbook is some students in the past wanted something they can refer to.
The textbook covers the basic concepts in a database system. I will add more materials for big data such as Hadoop and MapReduce as there are not enough materials in the textbook on this topic.
The first two chapters are related to database concepts. We will cover both chapters today but there are plenty to read, and you need to read them at home.
I will give you the references what you need to read (usually most of the chapter). There will be MCQs (Multiple-choice questions choice questions) and probably short questions in based on the textbook for individual assessments.
We cannot cover everything in class, so you have read it yourself at home, or may be when you cannot sleep As we just have a one-hour lecture, I can only cover the basic concepts. I will skip some slides due to time, but the materials and speaker notes are all enclosed.
INFS5710 Week 1
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
So, where do you find database? Today, data is everywhere. It is not limited to numbers; you can also have text, or even taking a photo is data.
If you look at a day in Susan’s life, she spends time using social media such as Facebook, Instagram, , Twitter, WeChat, QQ, WhatsApp, and so on. It will create gigabyte of data over times. It can include text, pictures, videos and so on. After work, when she goes to do some shopping, data will be captured by the supermarket, and credit card she uses. The supermarket and bank will know where, when, and how she spends her money. The marketing people would like to know someone like Susan where and what she spends her money on, so they if she is the right person to target her for a new products and promotions in the future.
All the databases have their purposes.
INFS5710 Week 1
Data versus Information
Produced by processing raw data to reveal its meaning
Requires context
Bedrock of knowledge
Should be accurate, relevant, and timely to enable good decision making
Raw facts
Have not yet been processed to reveal their meaning to the end user
Building blocks of information
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Information
Data consists of raw facts whereas Information is the result from processing raw data to reveal its meaning. Information creates knowledge and it is the foundation of decision making.
For example, data can be your marks for your quiz, marks for your assignment, and so on. For your other course such as e-commerce, you also have marks for quiz, participation, exam and so on. These are all data. We then give you a grade based on your mark for each of the courses you have done.
Information is when you put everything together in a report or a certificate, such as putting all your grades together and average the marks for the courses such as your WAM. The WAM and grades will tell you and your potential employer how you did at UNSW.
As for the point of decision making, using the example of your grades. When you apply for a job, the potential employer will look at your certificate compare with other potential candidates to decide if they will employ you along with your interview and so on.
INFS5710 Week 1
Introducing the Database
Shared, integrated computer structure that stores a collection of:
End-user data – Raw facts of interest to end user
Metadata: “Data about data”, which the end-user data are
integrated and managed
Describe data characteristics and relationships Database management system (DBMS)
Collection of programs
Manages the database structure
Controls access to data stored in the database
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Metadata is “data about data”, that is you describe data characteristics and relationships. This is important when you receive a set of data, you want to know what the numbers mean. If not, you will have to ask what the numbers mean. For instance, you have a few dates on a table or in a Spreadsheet, you want to know what each of dates means. When you do analysis, you have to use the correct dates.
Database Management System or DBMS is the management of data of how and where the data are stored, how the data can be retrieved, and who can retrieve the data.
INFS5710 Week 1
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
This is the first infrastructure you encounter in this course.
Infrastructure is important for data analytics because it is the foundation for storing and managing data. The infrastructure can be viewed as physical as well as logical. For instance, looking at the diagram above, a PC can logically be represented as a normal PC, a laptop or even a tablet or a mobile phone.
This is the infrastructure of a normal database system, i.e., the DBMS system, which manages the interaction between the users on the left-hand side, and the database on the right-hand side.
So how does it work? An end user on the left-hand side wants to find who bought this new product such as a new Coca Cola Vegemite flavour? Vegemite is a popular spread for kids in Australia if you don’t know what it is.
A request is sent to DBMS (Database Management Systems), which, in turn, processes the request and send a request using Structured Query Language or SQL to retrieve all the customers who bought this new Coca Cola Vegemite flavour. You will start learning SQL language in this week’s workshop.
INFS5710 Week 1
Once the data is received, it will send back to DBMS which then processes and send back to the end user. This usually becomes information. The users might use the information in their decision making.
in the textbook, it talks about different types of databases. It covers single user database, and multi-user database date. It also talks about structured data, and unstructured data. For more information, please read chapter 1 in your textbook!
INFS5710 Week 1
Why Database Design is important?
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Why Database Design is important?
If you look at the diagram, what do you see? You see many numbers and text. You have to make sense what they mean. It might look very confusing for the first time.
Let’s go into details – if you look at row #9 and row 10, why are they blank? Next, look at the Name column, do you need to separate into surname and first name or are they mixed? For example, I want to find every employees with surname “Smith”, or surname “Zhang”.
If I want to find all employees with skills for “Basic Database Manipulation”, you have to go to column Skill1 to look for “Basic Database Manipulation”, and then in column Skill2 for “Basic Database Manipulation”, and then in column Skill3. So, is there a better way to do this?
INFS5710 Week 1
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The slide shows what you will define as a good database design. You have three tables instead of one table. You will learn how to do this in ERD and Normalisation which will cover in the next few weeks.
Instead of having one gigantic table, we now have three smaller and different tables namely employee, skill, and certified. Now, if you want to find which employee who did the course Basic Database Manipulation, you can find all the employees much easier using these tables.
In the Skill table, basic Database Manipulation is Skill_Id 210. Now let’s go to Certified table, Skill_id is the middle column. For 210, we have two 06234 and 09382. Now, we can go to Employee table; you can see 06234 is and, 09382 is .
If you want an easier way, you will write SQL statements to link these tables together to get the results. This is something you can look forward to doing in the lab.
INFS5710 Week 1
Database Design
Focuses on the design of the database structure that will be used to store and manage end-user data
Well-designed database
Facilitates data management
Generates accurate and valuable information
Poorly designed database causes difficult-to-trace errors
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
In summary, to generate an accurate and valuable information, you need to have a well- designed database.
This is all I want to cover for Chapter One. You may want to read other topics I didn’t cover in lecture. You may want to read through the file system. SAS is based on a file system. like a csv file, because it started in the 1970s,, that is when computer hardware was very expensive!
INFS5710 Week 1
Data Models
2-1 to 2-6 (pp. 34 to 61)
Next, we will look at data models.
Please note that you do not have worry about the history and years in the chapter. This chapter mainly introduces new terms and concept that use in different types of databases. You have to know what they are. For example, you have to know what hierarchy is, and what network is.
The importance of the concepts such as Entity Relationship Model will be covered in more depths in the next few weeks.
INFS5710 Week 1
Data Modeling and Data Models
• Model – Abstraction of a real-world object or event
• Data modeling: Iterative and progressive process of creating a specific data model for a determined problem domain
Data models: Simple representations of complex real- world data structures
Useful for supporting a specific problem domain What would be the “data model” for UNSW like?
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Data modelling is to model and translate the business requirements into a data model they can be used to store data which to business can use.
Data model is designed to capture data in the real world. The data can then be used in the business.
So, what would be the data model like for the UNSW?
Firstly, you have to think of what data you need to capture. The university needs to capture your details as a student – your name, your address, your telephone number, your gender, and your past academic records.
The university will give you a unique Student Id which identifies who you are. When you do a course like INFS5710, it will capture your marks, your attendance records, your assignment marks, your exam results and your final mark. The system will store all the courses you have attempted, passed or failed.
The system also needs to schedule a class, i.e., where and when INFS5710 class will take place. Can it fit all the students in one class?
INFS5710 Week 1
Actually, the UNSW system is made up of several systems. Each system has its own data model, and they all linked together. For example, you can have a student system, generate profit and loss statements from the financial system, new student recruitment system, marketing system, student payment system, human resource and payroll systems, university shop rental systems, room booking system, library systems and so on and so forth. All these Systems are joined together; that is all different data models are actually linked together to create one complex data model.
So, a data model can be very complex. The UNSW systems took years to develop, and the systems consistently modified by the IT. Thus, the data models will continually be modified to fit the future needs. For instance, we now have online system as well as face-to-face classes for the same course and are running at the same time.
INFS5710 Week 1
Importance of Data Models
Are a communication tool
Give an overall view of the database
Organize data for various users
Are an abstraction for the creation of good
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The importance of data models…
However, getting the data model right or near what it supposes to do is not easy. People sometimes deliberately do not want to tell you some facts. Alternatively, an IT personnel did not ask the right question, or the question is misinterpreted. On the other hand, it might be the question is so simple that no one has thought about it.
For example, in one of the projects when I was working in the industry. The project is about transferring data between two different systems. A document circulated and went through by a few consultants and a few people until we had a telephone meeting, I asked a question, ”how do I know if the dollar amount is a credit or a debit (i.e. positive or negative)” There was a silence for about a minute, I then asked, “hello, anyone there?” I said to my boss, ”the connection must have dropped out.” A voice came from the other end, “No, we are still here. We did not look at that, and we need to go and check, and then come back to you…”
INFS5710 Week 1
Data Model Basic Building Blocks
Entity: Unique and distinct object used to collect
and store data
Attribute: Characteristic of an entity
Relationship: Describes an association among
Consider the following entities: professors, students, courses, departments, research centre, etc.
One-to-many (1:M)
Many-to-many (M:N or M:M)
One-to-one (1:1)
Violation examples:
• Enter an SID to STUDENT table, it finds two students (Entity integrity)
• STUDENT table says that student X’s department code is Y. But in the
DEPARTMENT Table, there is no code Y. (Referential integrity)
Constraint: Set of rules to ensure data integrity
people, thing, event, …
©2017 Cengage Learning®. May not be
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com