CS计算机代考程序代写 scheme python data structure data science database algorithm Semester review; exam preview

Semester review; exam preview

DATA1002/1902 Lecture 13B
Prof Alan Fekete

University of Sydney

DATA1002 sem2 2021 – Lecture 13B 1

2

COMMONWEALTH OF AUSTRALIA

Copyright Regulations 1969

WARNING

This material has been reproduced and communicated to you by or on
behalf of the University of Sydney pursuant to Part VB of the Copyright Act
1968 (the Act). The material in this communication may be subject to
copyright under the Act. Any further copying or communication of this
material by you may be the subject of copyright protection under the Act.

Do not remove this notice.

DATA1002 sem2 2021 – Lecture 13B

DATA1002 and DATA1902

• This lecture provides information on
DATA1002

• For DATA1902 students, see the Wednesday
lecture and its slides

DATA1002 sem2 2021 – Lecture 13B 3

Agenda

• Semester review
• Written exam preview
• Administrative issues
• More study of Data Science

DATA1002 sem2 2021 – Lecture 13B 4

Role of the unit

• “This unit covers computation and data handling,
integrating sophisticated use of existing
productivity software, e.g. spreadsheets, with the
development of custom software using the
general-purpose Python language. It will focus on
skills directly applicable to data-driven decision-
making. Students will see examples from many
domains, and be able to write code to automate
the common processes of data science, such as
data ingestion, format conversion, cleaning,
summarization, creation and application of a
predictive model.” [from UoS outline]

DATA1002 sem2 2021 – Lecture 13B 5

DATA1002 Learning outcomes 1

• 1. Ability to automate a computational
process, when given a clear account of the
algorithm to be applied. This will be done by
writing Python programs with core techniques
of procedural programming.

• Grok modules; Tutorials; Lecture 4B; Python
coding test and practice

DATA1002 sem2 2021 – Lecture 13B 6

DATA1002 Learning outcomes 2

• 2. Knowledge of Python syntax and semantics,
to trace and understand idiomatic code typical
of data science activities, including features
such as user-defined functions, exception-
raising and handling.

• Grok modules; lectures 2A, 3A, 3B [a bit], 4A,
5A, 7A, 11B; Tutorials

DATA1002 sem2 2021 – Lecture 13B 7

DATA1002 Learning outcomes 3
• 3. Experience with automation of the computational

process needed for examples of the various activity in
the data science pipeline: data ingestion and cleaning,
data format conversion, data summarization, visual and
tabular presentation of the results from
summarization, creation of a predictive model of a
given form, application of a predictive model to new
data, evaluation of a predictive model (and also,
automation of a pipeline that scripts use of existing
tools for these activities). The examples students have
seen will cover a diversity of application domains

• Lectures 1B, 4B, 5B [a bit], 7A [a bit], 8AB [a bit], 9B [a
bit]; Labs, Tutorials, Project Stages

DATA1002 sem2 2021 – Lecture 13B 8

DATA1002 Learning outcomes 4

• 4. Experience with both spreadsheets, and
programs in Python, for automatically
performing computational processes of data
science, and awareness of the similarities and
differences between tools.

• Lectures 2B, 8AB [a bit]; Labs 2,3

DATA1002 sem2 2021 – Lecture 13B 9

DATA1002 Learning outcomes 5

• 5. Understanding of main issues for data
management in connection with data science
activities, including value of data, importance of
metadata (that describes the format and meaning
of data, constraints on the data, origins of the
data, restrictions on use of the data, etc), issues
when sharing data across time and across users
(eg value of a manager role, access control,
persistence, recovery)

• Lectures 1B, 5B, 11A; Lab 1; Project Stage 1

DATA1002 sem2 2021 – Lecture 13B 10

DATA1002 Learning outcomes 6

• 6. Understanding of how data sets are
represented in computer files, in particular,
the many-to-many relationship between the
physical representation and the logical
representation; advantages and disadvantages
of different representations.

• Lectures 3B, 6A, 7B

DATA1002 sem2 2021 – Lecture 13B 11

DATA1002 Learning outcomes 7

• 7. Understanding of principles of charting and
information presentation, and ability to
produce good charts using both Python
libraries and spreadsheets; also capability to
evaluate charts for effectiveness in
communication.

• Lectures 8AB; Project stage 2; Labs

DATA1002 sem2 2021 – Lecture 13B 12

DATA1002 Learning outcomes 8

• 8. Understand the principles of machine
learning and its role in data science, in
particular the creation, use, and limitations of
predictive models for regression and
classification tasks, issues of over-fitting and
under-fitting, and evaluation of models.

• Lectures 9A, 9B, 12A, 13A; Project stage 3;
Labs

DATA1002 sem2 2021 – Lecture 13B 13

Big ideas from DATA1002

• Data science work activities
• Variety of predictive models
• Data should be managed (including metadata)
• Data representation choices
• Power of automating computation (and how to

do it)
• Notional machine for Python computation
• Libraries and their value
• Communication (techniques and principles)
• Evaluation, recognition of limitations

DATA1002 sem2 2021 – Lecture 13B 14

DATA1002 Assessment
• Weekly Python tasks (worth 10%, marked for

participation)
• Weekly quizzes (worth 10%)
• Python coding tests (worth 10%, in lectureslots

week 10)
• Project stage 1 (worth 5%, week 6)
• Project stage 2 (worth 10%, week 9)
• Project stage 3 (worth 5%, week 12)
• Proctored 2hr online exam (worth 50%)
There are also zero-weight practice coding tests

DATA1002 sem2 2021 – Lecture 13B 15

Reminder
• Minimum requirement: It is a policy of the School of

Computer Science that in order to pass this unit, a
student must achieve at least 40% in the written
examination. A student must also achieve an overall
final mark of 50 or more. Any student not meeting
these requirements may be given a maximum final
mark of no more than 45 regardless of their average.
– From official unit outline on Canvas

• Warning: Canvas report of your overall mark so far
may be inaccurate
– Calculate for yourself, using marks from tasks and the

announced weightings

1
6

DATA1002 sem2 2021 – Lecture 13B

Agenda

• Semester review
• Written exam preview
• Administrative issues
• More study of Data Science

DATA1002 sem2 2021 – Lecture 13B 17

DATA1002 Exam
• Worth 50% of the unit

– And also, minimum requirement of 40 points on exam
in order to Pass the unit

• Proctored (Review+) online exam
• 2 hr duration (plus a bit), done as Canvas quiz
• Scheduled by Exams Office during exam period
• Mix of question types (including written text), will cover

the content of lectures, tutes, labs, and assessments
(including Python programming concepts and skills)

ProctorU Review+ (“Type B”) Exam

• Please see
https://canvas.sydney.edu.au/courses/23380

• There are hardware and software
requirements on your environment
– Make sure you set everything up NOW
– Then do the practice Review+ exam to check it all

works properly: enrol at
https://canvas.sydney.edu.au/enroll/8WH4F7

– In case of technical difficulties: get help from Uni
ICT

DATA1002 sem2 2021 – Lecture 13B 19

https://canvas.sydney.edu.au/courses/23380
https://canvas.sydney.edu.au/enroll/8WH4F7

General principles

• The final exam will allow you to provide evidence that
you have achieved the learning outcomes of the unit

• Questions cover “knowledge” and “doing”
– You should not be surprised or tricked by any questions
– They align with the learning experiences through the

semester
• You will need to be quite fast in reading/understanding

the questions
– So, get familiar with the style and wording, from this

lecture!

DATA1002 sem2 2021 – Lecture 13B 20

Differences to previous years exams

• 2018 and 2019: 2hr exam invigilated in person
– Answers handwritten on paper exam paper
– Allowed: One A4 sheet of handwritten and/or typed notes

double sided
• 2020: 3 hr exam online

– download pdf of questions, produce pdf of answers and upload
– open book

• This year: 2 hr proctored exam online
– answer by typing within Canvas, eg multichoice, fill in text box
– You are allowed to look at hand-written or printed notes (on

paper), but no online material or communication with other
people

DATA1002 sem2 2021 – Lecture 13B 21

Exam Structure
• Exam is available as Canvas “quiz”, at the

time scheduled by the Exams Office
– Answer each question in the quiz (select

appropriate choice, enter text in textbox etc)
– You must sit exactly when scheduled

• The exam is on a special canvas site “Final
Exam for DATA1002”
– This is not the same as usual Canvas site for this

unit
– Make sure you visit this site as soon as it

becomes available (should be on November 15)

DATA1002 sem2 2021 – Lecture 13B 22

Timing
• Your exam is 2 hours and 10 minutes long (130 minutes). This includes 10

minutes of reading time, but you can start writing whenever you are
ready. You are strongly encouraged to use this time to carefully plan and
structure your response before you start writing.
– If you have an academic adjustment, it is supposed to have been added

automatically.
• Quiz buffer time: You will be allowed a buffer time of 40 minutes in case

you experience any technical issues starting your exam. This means that
you have 40 minutes to begin the exam and still get the full time allowed
to complete the exam. If you are unable to start your exam within the
buffer time, you should apply for special consideration. Buffer time does
NOT mean you have extra time to complete your exam.

• Please keep track of your time. Your quiz timer may not update if you
have an internet connection issue. Use the time on your computer so that
you always know how much time you have left. Only questions completed
within the exam time will be marked.

DATA1002 sem2 2021 – Lecture 13B 23

Warning: pay attention to time zone (and possible changes where you are).
All instructions refer to Sydney daylight-savings time.

Examples
The following assume start time of 13:00 – if you have some
different arrangement, your finish will also vary
• Start exam at 13:00, finish and submit by 15:10
• Start exam at 13:35, finish and submit by 15:45

– Start is within buffer, so you get the full allowed time
• Start exam at 13:50. finish and submit by 15:50

– Start is after buffer, so extra delay is taken from your time to sit
– However, if you are unable to start by 13:40 (end of buffer) you

are advised to apply for special consideration rather than start
late

• We advise you to be ready and try to start at 13:00; the
buffer may be needed to deal with technical issues

DATA1002 sem2 2021 – Lecture 13B 24

Submit

• Do not push submit button at bottom of the quiz,
until you are ready to finish work (or timer is
about to expire)
– Exam “quiz” is single-attempt
– System should be saving whatever you typed in

textboxes etc, as long as the internet connection is
maintained

– If connection is broken during exam, your machine
crashes, etc, then you will need to get “special
consideration” based on the technical difficulty (apply
immediately you can)

DATA1002 sem2 2021 – Lecture 13B 25

Resources
• Exam conditions: This is a restricted open

book exam. You are not permitted to use third
party communication or collaboration apps or
websites. Access to any such app or website is
strictly prohibited during your exam and is a
serious academic integrity breach.
– Materials permitted: Handwritten notes, printed

notes and textbooks.
– Materials provided: None

DATA1002 sem2 2021 – Lecture 13B 26

Warning: permitted materials are on paper only;
do not have other device (phone, tablet, laptop etc) nearby;
do not have other tabs or browsers open on your working machine

Notes

• Evidence shows that (i) choosing what facts to
have in notes, and then (ii) writing it out by
hand, is very beneficial for learning

• Once written out by hand, you may want to
type up again and have a printed-out version

• Remember that time is scarce in the exam, so
you need to be able to quickly find things in
the notes
– Organisation is vital

DATA1002 sem2 2021 – Lecture 13B 27

Exam Structure
• Do 19 questions worth in total 100 points
• Part A (Q1 to 10): 20 points [common to

data1002/1902]
• Part B (Q11 to 16): 60 points [common to

data1002/1902]
• Part C (Q17-19): 20 points [only for

data1002 students]
– PartCAdv will be discussed in Wed lecture for

data1902 only

DATA1002 sem2 2021 – Lecture 13B 28

Dataset used in some exam questions
Several of the questions in this exam refer to a comma-separated-values dataset
called employment_sector_technology.csv
This dataset was produced by some transformations and cleaning, on the data downloaded
from https://databank.worldbank.org/reports.aspx?source=jobs#
The first row is a header line, giving the names of the fields, whose meanings are as follows:
• Jurisdiction (that is, country name)
• Region
• Employment in agriculture (% of total employment)
• Employment in industry (% of total employment)
• Employment in services (% of total employment)
• GDP growth (annual %)
• Individuals using the Internet (% of population)
• Fixed broadband Internet subscribers (per 100 people)
• Mobile cellular subscriptions (per 100 people)
Here are the first few lines of the file
Jurisdiction,Region,Emp_agric,Emp_industry,Emp_services,GDP-
growth,Internet_use,Broadband,Mobile
Australia,Oceania,2.6,19.4,77.9,2.8,88.2,30.6,110.1
Austria,Europe,4.3,25.6,70.1,1.5,84.3,29.0,163.8
Belgium,Europe,1.3,21.3,77.5,1.4,86.5,37.6,110.5
Canada,Americas,1.9,19.5,78.5,1.4,91.2,36.9,84.7
Chile,Americas,9.5,23.0,67.5,1.3,83.6,16.2,130.1
Czech Republic,Europe,2.9,38.1,59.0,2.6,76.5,28.9,117.7
Denmark,Europe,2.5,18.6,78.8,2.0,97.0,42.6,122.3
Estonia,Europe,3.9,29.7,66.4,2.1,87.2,30.1,144.6

DATA1002 sem2 2021 – Lecture 13B 29

https://databank.worldbank.org/reports.aspx?source=jobs

Part A

• 10 factual automatically-graded questions,
each worth 2 points

• Questions can be multiple-choice, multiple-
answers, fill-in-blanks, etc

• Cover the range of content of unit

DATA1002 sem2 2021 – Lecture 13B 30

Part B

• 6 longer/deeper questions (hand-graded by
tutors and lecturer), each worth 10 points

DATA1002 sem2 2021 – Lecture 13B 31

Question 11

• 10 points
• “[Situation described] Write an explanation

for [target reader] about [particular aspects of
the situation]”
– Marking will reflect both content and effective

communication

DATA1002 sem2 2021 – Lecture 13B 32

Question 12

• 10 points
• Trace code given to indicate what would be

printed at indicated points in the execution.
• We encourage you to draw a notional machine

diagram (on paper) to help you to answer this
question.
– Note that you cannot use Grok or any

programming environment during the exam
– Similar to question in tute13

DATA1002 sem2 2021 – Lecture 13B 33

Question 13
• 10 points
• “Using the employment_sector_technology.csv dataset described above,

write well-documented and easily readable python code that will print
[some calculation described, example of output format given] You do not
need to deal with mis-formatted files or other errors. You are allowed to
use a library like Pandas, but this is not required. It is important that your
comments should clearly describe the data structure used for storing the
data in your program (eg if you use a dictionary, you must explain what
the keys and values represent; if you use Pandas, you must indicate the
indices of the dataframes your code refers to, etc). Ensure you use the
‘Preformatted’ block option to keep your code readable, accessed by
clicking the dropdown saying ‘Paragraph’ in the answer box font settings.”
– Marking will reflect appropriate logic, knowledge of Python techniques, and

documentation
– Note that you cannot use Grok or any programming environment during the

exam
– Similar to question in tute13

DATA1002 sem2 2021 – Lecture 13B 34

Question 14
• 10 points
• “[person] has created the following chart [image shown below]

from the employment_sector_technology.csv dataset which we
described above, showing [some aspect of the situation]. This chart
presents information from 4 of the data attributes, namely: [list of
data attributes shown]

For this chart:
• State what kind of chart this is (eg is it pie chart, scatter plot, etc) [1

point]
• State what type of encoding has been used for each of the 4 data

attributes shown [4 points]
• State 2 issues with this chart that limit its usefulness [2 points]
• Describe an alternative chart and/or encoding scheme you believe

would be a better choice for this data and explain why [3 points]”
– Marking will reflect knowledge of charting principles, their application

in this specific situation, skill in evaluation
– Similar to question in lab13

DATA1002 sem2 2021 – Lecture 13B 35

Question 15

• 10 points
• For a given situation involving the storage and

handling of data in a project, discuss concerns
or consequences, and indicate
recommendations for doing it better.
– Marking will reflect knowledge of data

management principles, linking these to described
situation, and sensible decisions

DATA1002 sem2 2021 – Lecture 13B 36

Question 16
• 10 points
• “You are working on a data analysis project that involves a new

algorithm which uses a predictive model built with machine
learning. [description of the task, and an issue that has been
identified in how the system is operating for different cases].
State 1 possible reason why that issue may be occurring.
State 1 possible solution to mitigate this issue, and explain whether
you believe this solution will completely or partially correct the
problem and why. If there is no solution possible, or if you believe
the issue should be left without being changed, state this, and
explain why this is the case.
– Marking will reflect understanding of ethical issues, links to this

specific situation, clarity of thought [but we do not require a particular
ethical decision (eg you can get marks whether you propose a change,
or propose to leave things unchanged, as long as you show good
awareness of the issues and justify your approach carefully

– Similar to question in lab13

DATA1002 sem2 2021 – Lecture 13B 37

Part C

• [Only for students in data1002]
• 3 questions

DATA1002 sem2 2021 – Lecture 13B 38

Question 17

• 8 points
• [One type of work you did in some stage of

the project] Describe what this activity
involves, explain why it is important in data
science. “In your answer, give an example of
[this activity] which you (or someone in your
group) [did] during the Project work.”
– Marking will reflect knowledge and having

appropriate detailed specific example

DATA1002 sem2 2021 – Lecture 13B 39

Question 18

• 4 points
• Show the output produced from executing

given Python code.
– Similar in style to question 12

DATA1002 sem2 2021 – Lecture 13B 40

Question 19

• 8 points
• Describe [some issue in data management],

and how this can be important/useful.
– Similar in style to Question 15

DATA1002 sem2 2021 – Lecture 13B 41

Exam technique
• Plan how you will allocate time (wisely)

– Use “reading time” to check your understanding
– Also to plan time allocation to questions

• Answer everything (get the “easy marks” in each part)
– Even if you don’t know the full answer, show that you have some

relevant knowledge
– Respond to question details (eg if it asks for “explain” and

“example”, then you should provide both in your answer)
• Write clearly and efficiently

– Start with outline/bullet points, then expand if you have time
– No need for fancy style
– When question describes a target reader, you should

communicate effectively for that target (see lecture 6B)

4
2

DATA1002 sem2 2021 – Lecture 13B

How much to write?

• A 10 point question would be expected to take
about 10-12 minutes to answer
– Including thinking, typing, checking, revising

• A good answer can often be done in two or three
focused paragraphs

• You need to show the marker that you know and
understand the concepts
– And you ought to answer the specific question that is

asked
• Watch the instructions carefully

DATA1002 sem2 2021 – Lecture 13B 43

Agenda

• Semester review
• Written exam preview
• Administrative issues
• More study of Data Science

DATA1002 sem2 2021 – Lecture 13B 44

Scheduling

• Written exam scheduled by Exams Office
– Time is described in Sydney timezone
– Special arrangements available for those who are

in timezone where the written exam schedule is
late at night or very early morning

• Apply through official process
• You will instead be scheduled in replacement exam

period

DATA1002 sem2 2021 – Lecture 13B 45

Illness
• If you are unwell, and it seems that you won’t be able to

demonstrate your knowledge/skill properly, then you can
request special consideration

• Follow the same procedure as during semester (get medical
person to fill out special USyd form, scan and attach when
you fill in the online form, within 3 days)
– (or, make “student declaration”)

• Usual outcome: an alternate test in “replacement exam
period”

• If you become sick during the exam itself, submit whatever
you have done
– And apply for special consideration

• The University goal is to get a fair assessment of what you
have achieved

4
6

DATA1002 sem2 2021 – Lecture 13B

Technical and logistical issues

• Final exam is quite time constrained
• Make sure you will have a good place to work

(quiet for those 2+ hours, comfortable place
for typing, reliable internet, no-one else in
room, etc)

• Make sure you know how to use tools
• If anything goes wrong technically, apply for

special consideration on this basis
– With “student declaration” as evidence

DATA1002 sem2 2021 – Lecture 13B 47

During the exam

• Teaching staff are not allowed to communicate to
students during the exam

• If any MCQ question seems wrong or confusing
(typo etc): pick the best answer you can
(afterwards, report it to us in private post on Ed)

• If any essay question seems wrong or confusing:
note this at start of your answer, say how you are
interpreting the question, then answer based on
that

DATA1002 sem2 2021 – Lecture 13B 48

Academic integrity

• You must not get assistance from other people
or use resources other than what is allowed

• You must not reveal the questions (neither
during the exam, nor afterwards)

DATA1002 sem2 2021 – Lecture 13B 49

Agenda

• Semester review
• Written exam preview
• Administrative issues
• More study of Data Science

DATA1002 sem2 2021 – Lecture 13B 50

What’s next
• DATA2001/2901 Data Science: Big Data and Data Diversity [semester 1]
• “This course focuses on methods and techniques to efficiently explore and analyse

large data collections. Where are hot spots of pedestrian accidents across a city?
What are the most popular travel locations according to user postings on a travel
website? The ability to combine and analyse data from various sources and from
databases is essential for informed decision making in both research and industry.

Students will learn how to ingest, combine and summarise data from a variety of
data models which are typically encountered in data science projects, such as
relational, semi-structured, time series, geospatial, image, text. As well as
reinforcing their programming skills through experience with relevant Python
libraries, this course will also introduce students to the concept of declarative data
processing with SQL, and to analyse data in relational databases. Students will be
given data sets from, eg., social media, transport, health and social sciences, and
be taught basic explorative data analysis and mining techniques in the context of
small use cases. The course will further give students an understanding of the
challenges involved with analysing large data volumes, such as the idea to
partition and distribute data and computation among multiple computers for
processing of ‘Big Data’.” [UoS outline in Handbook]

DATA1002 sem2 2021 – Lecture 13B 51

Final General advice
• Be prepared

– Ready for technical issues and environment
• See https://canvas.sydney.edu.au/courses/23380/pages/record+-help-

and-faqs
– Ready pedagogically

• Do examprep-MCQ quiz, lab13 and tute13 quiz (all under exam-like
conditions)

• Work through revision exercises
• Make a set of notes
• Watch Ed discussions

– Ready Mentally and physically
• Be well-rested, and reasonably fed (but not over-full)

– Have plenty of water (In a clear bottle or glass)
• Relax

Good luck!

https://canvas.sydney.edu.au/courses/23380/pages/record+-help-and-faqs