CS代考计算机代写 asp.net hadoop database case study file system SQL MONASH

MONASH
INFORMATION TECHNOLOGY
Where To?
Exam Preparation
FIT2094-FIT3171

Operational Database – the unit’s focus
2

Application Development
▪ Web based front ends
– Wide range of approaches: PHP, ASP.NET, etc
– Very Rudimentary (requires VPN)
▪ PL/SQL
– backend development
– Triggers, functions, procedures and packages
– Procedure to change employee departments: move_employee
(empno, new dept)
3

FIT2104 – Web database interface
4

Usage of database
▪Example of a supermarket ▪Decision making
–Operational level
•How often do we need to re-stock X-item?
–Strategic and tactical level
•Is there any branch that performs worse than the state average?
•What is the total sales made by each state each year and across a number of years?
5

Operational Data vs. Decision Support Data
▪ Operational data
– Mostly stored in relational database
– Optimized to support transactions representing daily operations
– Example:
– How many students enrolled in FIT2094?
▪ Decision support data differs from operational data in three main areas:
– Time span
– Granularity
– Dimensionality
– Example:
– What is the total number of students in the foundation units in each year (subtotal of the two semesters numbers) and the total across years, across a single unit.
6

7

8

▪ ▪
▪ ▪
Decision Support Database Requirements
Specialized DBMS tailored to provide fast answers to complex queries Three main requirements
– Database schema
– Data extraction and loading
– Database size
Database schema
– Complex data representations
– Aggregated and summarized data
– Queries extract multidimensional time slices
Data extraction and filtering
– Supports different data sources
• Flat files
• Hierarchical, network, and relational databases
• Multiple vendors
– Checking for inconsistent data
9

The Data Warehouse
▪ Database size
– In 2013, eBay had around 90 Petabytes of data in its
data warehouses (90,000 Terabytes)
– DBMS must support very large databases (VLDBs)
▪ Integrated, subject-oriented, time-variant, and nonvolatile collection of data
– Provides support for decision making
▪ Usually a read-only database optimized for data analysis and query processing
▪ Requires time, money, and considerable managerial effort to create
10

11

FIT3003 – Business intelligence and data warehousing
12

IOT – the explosion – Data, Data, Data …..

13

Data Growth
Source: https://www.domo.com/learn/data-never-sleeps-7#/
14

Data Growth
Source: https://www.seagate.com/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
15

Big Data Characteristics
▪ Volume
– The quantity of data to be stored
▪ Velocity
– The speed at which data enters the system and must be processed
▪ Variety
– Variations in the structure of the data to be stored
16

▪ ▪
Big Data Characteristics: Volume
Scaling up: keeping the same number of systems but migrating each one to a larger system
Scaling out: when the workload exceeds server capacity, it is spread out across a number of servers
17

Scaling
▪ How do we scale current relational systems? SQL designed for database as a single physical entity
– Purchase bigger “boxes”: costly and has real limits
– Increase the number of processors, yielding parallel computation/database with complex issues to handle
– Distribute database – challenges to maintain ACID transaction principles and issues of availability/consistency
▪ The rise of OO programming in the 80’s also highlighted a problem known as the “Impedance Mismatch”
– The program treats items as objects, but they need to be mapped to relational tables (“de aggregating” the object)
– Also issues about “private” vs “public” (relational about need, OO absolute characteristic of data)
18

Scaling continued
▪ Big players, notably Google and Amazon chose a different path
– Lots and lots of smaller boxes (“commodity” servers)
– Non relational structure
– Google: Bigtable
• https://research.google/pubs/pub27898/
• https://cloud.google.com/bigtable/docs/overview
• Used for wide range of apps Gmail, Google Earth, YouTube
– Amazon: Dyanmo
• http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
• Based on Dynamo: https://aws.amazon.com/dynamodb/
19

Scaling continued
▪ Term “NoSQL” coined by John Oskarsson in 2009 after calling a …”free meetup about “open source, distributed, non relational databases” or NOSQL for short”…
– http://blog.oskarsson.nu/post/22996139456/nosql-meetup
▪ Characteristics
– Non relational,
– mostly open source,
– distributed (cluster friendly),
– schema-less (no fixed storage schema)
20

Big Data Characteristics: Velocity
▪ Stream processing: focuses on input processing and requires analysis of data stream as it enters the system
– CERN Large Hadron Collider 600TB per second 1 GB per second ▪ Feedback loop processing: analysis of data to produce
actionable results
21

Fast Data Processing
▪ Computer systems
–Parallel computer: A single machine with massive number of CPUs.
–Cluster of computers: Multiple machines connected via network; Commodity computer.
▪ Database structure
–Non-relational database (NoSQL)
•No update, append only. Optimised for a ’main’ operation
•Examples: MongoDB, Cassandra –Distributed File Systems
•HDFS (Hadoop File Systems) / Parquee File Systems ▪ Parallel data processing
–Hadoop / Spark ▪ In Memory database
22


▪ ▪
Big Data Characteristics: Variety
Structured data: fits into a predefined data model
– Relational databases
– Incoming data decomposed under normalisation rules to fit the data model
Unstructured data: does not fit into a predefined model
– Big Data requires that the data is captured in its natural format as generated without imposing a data model on it
Semi structured data: combines elements of both
23

FIT3176 – Advanced database design
24

Data Processing Ecosystem
http://www.clearpeaks.com/blog/big-data/big-data-ecosystem-spark-and-tableau
25

“Horses for Courses”
▪ Conventional RDBMS will continue play an important and significant role in OLTP (Online Transactions Processing)
▪ Increasingly now a range of database products are available, need to select appropriate product/model for task at hand.
26

FIT2094-FIT3171 Exam
27

2021 Semester Summer B Exam Format
▪ Timed: 2 hours 10 minutes (reading time included)
▪ e-exam platform: https://eassessment.monash.edu/
▪ Close book (no cheat sheets), e-invigilated
▪ Learn more here:
https://www.monash.edu/exams/electronic-exams/about
– read:
• Supervised eExams using Monash eVigilation
• eExams requiring handwritten answers
For the modelling question makes use of a hybrid question
– write answer on paper (mark answered on e-exam)
– photograph with phone
– upload via QR code (after exam has been completed) – important that
you practice this process
– and Try out a general knowledge practice exam
▪ Note that the exam is a time-pressured test
– manage your time wisely
28

2021 Summer B Exam Format
▪ 100 marks 50% of your final mark in FIT2094-FIT3171. – Minimum to pass the unit overall:
• 40% in-semester, 40% exam and 50% overall
– Assignment 2 marking will not be finalised before the exam
▪ Questions:
– 6 parts – cover theory and application
– Timing is crucial – 100 marks, 120 mins – 1 mark/minute target
– Part D – SQL Case Study will be released 9am the day before the exam day.
• No tables/data provided on FITUGDB, no access to SQL Developer, LucidChart and other softwares
▪ Exam when?
– your responsibility
29

2021 SSB Mock Exam
▪ Link is provided on Exam tab Moodle, self enrolled.
– timed (2 hrs 10 mins), unlimited attempts
– do not open or attempt this mock exam until such time as you are able to make a full 2 hrs and 10 mins attempt, rather than open and browse the paper.
▪ Available from Thursday 11th February at 12PM, sample solution will be available Monday 15th February from 3 PM
▪ Serves to provide a general overview of the general structure of the exam only.
▪ To protect the integrity of the exam: NO ACTUAL EXAM QUESTIONS are included; and the COMPOSITION OF THE QUESTIONS IN EACH PART are SUBJECT TO CHANGE.
▪ All content specified by the Unit Guide is examinable, including but not limited to
– Pre-reading (weekly Coronel & Morris chapters)
– Workshop Slides and Videos
– Tutorial Notes, and
– all other Moodle Materials (except where explicitly stated NOT EXAMINABLE).
30

Workshop Session 2 and 5 – Data Modelling INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪Conceptual vs Logical Level
▪Entity
–Strong vs weak
–Associative entity
▪Types of attributes
▪Relationship
–Type : one-to-one, one-to-many, many-to-many
–Cardinality and Participation –Identifying vs Non-identifying.
▪Mapping from Conceptual to Logical –E.g. Mapping many-to-many
▪FIT3171 – UML
31

Workshop Session 3 – Relational Model INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪Relational model properties.
▪Keys
–Superkey, Candidate Key, Primary Key –Foreign Key
▪Data Integrity –Entity integrity –Referential Integrity
▪Relational Algebra –Understanding of efficiency
32

Workshop Session 4 – Normalisation INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪UNF to 3 NF –Mapping form to UNF
–UNF to 1 NF – remove repeating group. –1NF to 2 NF – remove partial dependency. –2NF to 3NF – remove transitive dependency.
▪Dependency diagrams
–Use the general definition
–Partial in 1NF, Transitive in 2NF, Full in 3NF
•use this notation: cust_id → cust_name, cust_address
▪Be careful in choosing the PK!
▪Mapping a set of 3NF relations to a logical model
33

Workshop Sessions 6 and 8– DDL and DML
INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪DDL
–CREATE TABLE statements
•Primary key definition •Foreign key definition •Other Constraints
–ALTER
–DROP ▪DML
–INSERT
•Adherence to referential integrity constraints and the order of insertion –Oracle Sequence
–UPDATE (DML) –DELETE (DML) –COMMIT
34

Workshop Session 7, 9 and 10 – SQL INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪Single table retrieval with predicate
▪Join
–Natural join
–Outer join ▪Aggregate functions ▪Set Operators ▪Subquery
▪Oracle functions
TO_CHAR, TO_DATE, NVL, UPPER, LOWER, ROUND
35

Workshop Session 9 PLSQL and Workshop Session 11 DB Connectivity, Web Technology
INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪Web database connectivity
–Understanding of the principles and ALL core concepts:
•Database middleware
•Web to database middleware
•Using PHP to communicate with databases
–must understand php code which relates to database –Database design frameworks
•modern frameworks
•ORM
–Security → SQL Injection
▪FIT3171 – TRIGGER
36

Workshop Session 8 – Transaction Management INCLUDING BUT NOT LIMITED TO THESE TOPICS…
▪Transaction.
▪ACID properties.
▪Transaction problems.
▪Transaction management with locks.
▪Wait For Graphs
▪Restart and Recovery using Transaction Log.
37

Workshop Session 12
▪The content of Session 12’s workshop –Database Trends
–Future directions
Is NOT examinable (questions relate to this session’s new content will not appear on the exam)
38

Consultations for Final Exam
▪ From Tuesday 16th February 2021, one online consultation session per day will be provided.
– Details posted on Moodle
▪ Please don’t come to consultations in a hope to squeeze some useful information about final exam
– Session intended to clear up any issues YOU find as you prepare for the exam
39

40