CS代考 APS1070

APS1070
Foundations of Data Analytics and Machine Learning
Fall 2021
Lecture 1:
• Introduction
• CourseOverview
• Machine Learning Overview
• K-nearestNeighbourClassifier
Prof.
1

2

Instruction Team
Instructor: Prof.
Head-TA: Zadeh
TA: TA: Haoyan (Max) A: TA:
Get to know the instruction team: https://q.utoronto.ca/courses/223861/pages/course-contacts
3

Communication
➢Preferred contact method for a quick response: Piazza;
1. Via Piazza Post to the “Entire Class”
2. Via Piazza Question using Post to “Instructor(s)” – Type the specific person’s name from the list
or type “instructors” to include us all
➢Communication via email (APS1070 in subject line) is fine if you have a reason for not using Piazza for that question.
4

Emails
Instructor: Prof. Head-TA: Zadeh TA:
TA:
TA: Haoyan (Max) A:

➢Please prefix email subject with ‘APS1070’
5

A little bit about your instructor …
I worked for a
Then, I worked
while here as a visiting researcher.
And, I am a new professor here at the UofT.
here as a research scientist.
X
I studied
Industrial
Engineering
here
I was born here.
X X
X
X
And
I studied worked
Computer here as a
Science data
here.
X scientist.
X
6

Traditional Land Acknowledgement
I wish to acknowledge the Indigenous Peoples of all the lands which we call home including the land on which the University of Toronto operates.
For thousands of years, it has been the traditional land of the Huron- Wendat, the Seneca, and the Mississaugas of the Credit.
Today, this meeting place is still the home to many Indigenous people from across Turtle Island and I am grateful to have the opportunity to work on this land.

Native Land


7

Networks
8

9

Balanced network
10

In a balanced network:
Enemy of an enemy = friend Enemy of a friend = enemy Friend of an enemy = enemy Friend of a friend = friend
11

Balanced network
In a balanced network:
Enemy of an enemy = friend Enemy of a friend = enemy Friend of an enemy = enemy Friend of a friend = friend
12

Meet Mike, a friend of and .
13

Here is George, another friend of .
does not really know George that well.
knows George and they hate each other!
14

Unbalanced network
Enemy of an enemy = friend Enemy of a friend = enemy Friend of an enemy = enemy Friend of a friend = friend
15

BalanUcendbasluabngcreadph
It is 1 edge away from balance.
16

Community detection in social networks
-positive relationship (green edge) -negative relationship (red edge)
Nodes: people
Edges: positive or negative ties
Social Networks
17

Tribes
216= 65536 possible communities
Social Networks
18

Monks
218= 262144 possible communities
Social Networks
19

Biological Network of a Protein
2329= 1 duotrigintillion !
Biology
20

Baker’s yeast
Biology
Nodes: biological molecules
Edges: activation or inhibition relations
21

Financial portfolios
Nodes: securities (investments)
Edges: positive or negative correlations (of returns) Network: portfolio
Finance
22

US Senate over time
Node colors=party affiliation:
Republican
Democrat
Edge colors:
Collaboration
Avoidance
Political Science
23

Bipartivity of fullerenes
Chemistry
Nodes: carbon atoms
Edges: atomic bonds (considered negative) Network: molecule
24

Atomic magnets
Physical context
25

International Relations (1946-1996)
International Relations
Nodes: countries
Edges: international relations
26

About you…
➢What part of the world are you joining us from? ➢What is your area of study?
➢Previous experience?
➢Why did you take this course?
➢What do you want to get out of this course?
27

Survey: What is your time zone?
26.2% 0.8% 2.4% 61.9% 0.0% 4.0% 4.8%
28

Survey: Undergraduate Degree?
29

Survey: Undergraduate Studies in Toronto?
47.2% No 52.8% Yes
30

Survey: Why did you take this course?
➢To become Machine Learning Researcher (~20.2%) ➢To become a Data Scientist (~74.2%)
➢Starting a Machine Learning Startup (~19.4%) ➢For fun (~8.1%)
➢…
31

Survey: Programming Languages?
32

Survey: Rate Programming Abilities?
33

Why are you taking this course?
34

What kind of Projects would you like to do?
35

Part 1
Course Overview
36

Course Description
APS1070 is a prerequisite to the core courses in the Emphasis in Analytics. This course covers topics fundamental to data analytics and machine learning, including an introduction to Python and common packages, analysis of algorithms, probability and statistics, matrix representations and fundamental linear algebra operations, basic algorithms and data structures and continuous optimization. The course is structured with both weekly lectures and tutorials/Q&A sessions.
37

Primary Learning Outcomes
By the end of the course, students will be able to:
1. Describeandcontrastmachinelearningmodels, concepts and performance metrics
2. Analyze the complexity of algorithms and common abstract data types.
3. Performfundamentallinearalgebraoperations,and recognize their role in machine learning algorithms
4. Applymachinelearningmodelsandstatistical methods to datasets using Python and associated libraries.
38

Course Components
➢Lectures: Tuesdays (3 hrs)
➢Tutorials/Q&A Sessions: Thursdays (2 hrs)
➢Four projects (submitted via GitHub Classroom)
➢Eight tasks/quizzes for reading assignments (submitted via Quercus)
➢Any material covered in lectures / tutorials / readings / projects / Piazza is fair game for the midterm and final assessments.
39

APS1070 Course Information
➢Course Website: http://q.utoronto.ca ➢Access course materials and Zoom Sessions
➢Verify using UTORid ➢ Textbooks:
➢ “Mathematics for Machine Learning” by . Deisenroth et al., 2020 (free)
➢ “The Elements of Statistical Learning”, 2nd Edition, by et al., 2009 (free)
➢ “Introduction to Algorithms and Data Structures”, 4th Ed, by . Dinneen, ’farb, and . Wilson, 2016 (free)
➢ Piazza discussion board for tutorial, project and almost all questions and communications.
Link Link
Link
40

Computer Requirements and Online Tools
You will need access to:
➢A computer equipped with microphone and webcam (and ideally 2 screens) to attend and participate in lectures (or view recordings) and practical sessions on Zoom
➢Jupyter Notebook, preferably through Google Colab, to be able to complete the projects
➢GitHub for submitting projects
➢Quercus for submitting reading assignment tasks/quizzes and course announcements
41

Computer Requirements and Online Tools
You will need access to:
➢Piazza to ask questions, communicate with the teaching team, and participate in course discussions
➢Top 10 endorsed answerers (across the two sections of APS1070 for the Fall 2021) on Piazza with at least 3 endorsed answers get 2 points added to their final course grade.
➢Questions in the general forms of “is this the correct answer?” or “what is wrong with my code?” or “why my code does not compile?” and the like will not receive a response.
42

Grade Breakdown
Projects/Quizzes
Weight (%)
Tentative Schedule
Eight tasks/quizzes for reading assignments
4
Due on Mondays at 21:00 (for weeks 2,3,4,5 and 7,8,9,10 as per course schedule)
Project 1
10
Due Oct 1 at 21:00
Midterm Assessment
20
Oct 18 at 9:00 to Oct 19 at 21:00
(limited 2-hour window to start the exam and submit it)
Project 2
13
Due Oct 22 at 21:00
Project 3
10
Due Nov 5 at 21:00
Project 4
13
Due Nov 26 at 21:00
Final Assessment*
30
Dec 9 at 9:00 to Dec 10 at 21:00
(limited 3-hour window to start the exam and submit it)
43

Penalty for late Submissions
Quercus/GitHub submission time will be used. Late projects and reading assignments will incur a penalty as follows:
➢ -30% (of project maximum mark) if submitted within 72 hours past
the deadline.
➢ A mark of zero will be given if the submission is 72 hours late or
more.
Late midterm and exam submissions get a grade of 0.
44

Class Representatives
If you have any complaints or suggestions about this course, please talk to me or email me directly. Alternatively, talk to one of the class reps who will then talk to me and the teaching team.
We need 2 class reps per section. Volunteers can send me an email (with “APS1070 class rep” in subject line) by 14 September.
Class reps are asked to keep in touch with the instruction team about any feedback they receive from students and to attend two staff-student meetings over the course of this semester.
This can be a great opportunity to develop your leadership skills.
45

Academic Integrity
➢All the work you submit must be your own and no part of your submitted work should be prepared by someone else. Plagiarism or any other form of cheating in examinations, tests, assignments, or projects, is subject to serious academic penalty (e.g., suspension or expulsion from the faculty or university).
➢ A person who supplies an assignment or project to be copied will be penalized in the same way as the one who makes the copy.
➢Several plagiarism detection tools will be used to assist in the evaluation of the originality of the submitted work for both text and code. They are quite sophisticated and difficult to defeat.
46

Software
➢Python 3
➢NumPy, Matplotlib, Pandas and many more
➢Google Colaboratory ➢Jupyter notebook in the cloud
➢no installation required ➢requires Google Drive
All project handouts will be Jupyter notebooks
47

Why Python for Data Analysis?
➢Very popular interpreted programming language
➢Write scripts (use an interpreter)
➢Large and active scientific computing and data analysis community
➢Powerful data science libraries – NumPy, pandas, matplotlib, scikit-learn, dask
➢Open-source, active community ➢Encourages logical and clear code ➢Invented by Rossum
48

Course Philosophy
➢Top-down approach ➢Learn by doing
➢Explain the motivations first ➢Mathematical details second
➢Focus on implementation skills
➢Connect concepts using the theme of End-to-End Machine Learning
My goal is to have everyone leave the course with a strong understanding of the mathematical and programming fundamentals necessary for future courses.
49

Tentative Schedule (Weeks 1 – 3)
Date
Time (morning section 0101)
Time (evening section 0201)
Location
Topics
Week 1
Tues.
07-Sep
9:00-12:00
17:00-20:00
Zoom
Introduction
Course Overview, Machine Learning Overview, K-Nearest Neighbours
Thurs.
09-Sep
9:00-11:00
17:00-19:00
Zoom
Tutorial 0 – Python Basics and GitHub
Week 2
Reading assignment 1 Due – Sep. 13 at 21:00
Tues.
14-Sep
9:00-12:00
17:00-20:00
Zoom
Algorithms and Data Structures
Analysis of Algorithms, Asymptotic Notation, Sorting, Dictionary ADT, Hashing
Thurs.
16-Sep
9:00-11:00
17:00-19:00
Zoom
Tutorial 1 – Basic Data Science
Week 3
Reading assignment 2 Due – Sep. 20 at 21:00
Tues.
21-Sep
9:00-12:00
17:00-20:00
Zoom
Data Exploration, Making Predictions, Foundations of Learning
End-to-End Machine Learning, Data Wrangling, Plotting and Visualization, Decisions Trees
Thurs.
23-Sep
9:00-11:00
17:00-19:00
Zoom
Q/A Support Session
50
Computer Science and Programming

Tentative Schedule (Weeks 4-6 and reading week)
Week 4
Reading assignment 3 Due – Sep. 27 at 21:00
Tues.
28-Sep
9:00-12:00
17:00-20:00
Zoom
Measuring Uncertainty and Evaluating Performance
K-Means Clustering, Probability Theory, Multivariate Gaussians, Performance
Thurs.
30-Sep
9:00-11:00
17:00-19:00
Zoom
Q/A Support Session
Project 1 Due – Oct. 1 at 21:00
Week 5
Reading assignment 4 Due – Oct. 4 at 21:00
Tues.
05-Oct
9:00-12:00
17:00-20:00
Zoom
Mathematical Foundation of Data Processing
Linear Algebra, Analytical Geometry and Transformations, Data Augmentation
Thurs.
07-Oct
9:00-11:00
17:00-19:00
Zoom
Tutorial 2 – Anomaly Detection
Reading Week
Oct 11: Thanksgiving Extra Q/A Sessions on Thur. Oct 14 9:00-11:00 and 17:00-19:00
Week 6
No lecture on 19 Oct. No office hours on 20 Oct.
Midterm Assessment: Oct 18 at 9:00 to Oct 19 at 21:00 (limited 2-hour window to start the exam and submit it)
Thurs.
21-Oct
9:00-11:00
17:00-19:00
Zoom
Q/A Support Session
Project 2 Due – Oct 22 at 21:00
51
Mathematical Foundations

Midterm
➢Consists of multiple choice, short answer, math and programming questions as well as analytical and reasoning questions
➢Cover all material before the midterm
➢Late midterm submissions will receive a grade of 0
➢Access to all course materials (except for Piazza)
➢More details will be provided 1 – 2 weeks before the midterm
52

Tentative Schedule (Weeks 7 – 9)
Week 7
Reading assignment 5 Due – Oct. 25 at 21:00
Tues.
26-Oct
9:00-12:00
17:00-20:00
Zoom
Dimensionty Reduction Part 1
Projection, Matrix Decomposition, Eigenvectors, Principal Component Analysis
Thurs.
28-Oct
9:00-11:00
17:00-19:00
Zoom
Tutorial 3 – PCA
Week 8
Reading assignment 6 Due – Nov. 1 at 21:00
Tues.
02-Nov
9:00-12:00
17:00-20:00
Zoom
Dimensionty Reduction Part 2
Singular Value Decomposition, Feature Interpretation, Vector Calculus
Thurs.
04-Nov
9:00-11:00
17:00-19:00
Zoom
Q/A Support Session
Project 3 Due – Sat. Nov. 5 at 21:00
Week 9
Reading assignment 7 Due – Nov. 8 at 21:00
Tues.
09-Nov
9:00-12:00
17:00-20:00
Zoom
Generalized Linear Model
Linear Regression, Gradient Descent, Polynomial Regression, Regularization
Thurs.
11-Nov
9:00-11:00
17:00-19:00
Remembrance Day – no lab sessions
53
Neural Networks
Mathematical Foundations

Tentative Schedule (Weeks 10 – 13)
Week 10
Reading assignment 8 Due – Nov. 15 at 21:00
Tues.
16-Nov
9:00-12:00
17:00-20:00
Zoom
Artificial Neural Networks
Continuous Optimization, Convexity, Classification, Perceptron, Neural Networks
Thurs.
18-Nov
9:00-11:00
17:00-19:00
Zoom
Tutorial 4 – Linear Regression
Week 11
Tues.
23-Nov
9:00-12:00
17:00-20:00
Zoom
Deep Learning
Backward propagation, Deep Learning, Transfer Learning, Discrete Optimization
Thurs.
25-Nov
9:00-11:00
17:00-19:00
Zoom
Q/A Support Session
Project 4 Due – Nov. 26 at 21:00
Week 12
Tues.
30-Nov
9:00-12:00
17:00-20:00
Zoom
Course Review
Thurs.
02-Dec
9:00-11:00
17:00-19:00
Zoom
Q/A Support Session
Week 13
No lectures, no office hours, no lab sessions
Final Assessment: Dec 9 at 9:00 to Dec 10 at 21:00 (limited 3-hour window to start the exam and submit it)
54
Review Neural Networks

Slide Attribution
The slides used for APS1070 contain materials from various sources. Special thanks to the following authors:
• Zadeh

• Sinisa Colic
• . Wilson (Lecture 2 in particular)
55

Part 2
Machine Learning Overview
56

Why are we here?
➢ Machine Learning skills is in demand
➢ Neural Networks are evolving and leading to new performance limits ➢ Deep Neural Networks are at the cutting-edge of applied computing
57

Rising Tide of AI Capacity
Source: Hans-Moravecs-rising-tide-of-AI-capacity
58
➢ Jobs requiring
creativity seems to be a safe career choice…
➢ Programming and AI design seem safe

Why Machine Learning?
Q: How can we solve a programming problem?
➢Evaluate all conditions and write a set of rules that efficiently address those conditions
➢ex. robot to navigate maze, turn towards opening Q: How could we write a set of rules to
determine if a goat is in the image?
Requires systems that can learn from examples…
59

Examples of Applications
➢Finance and banking ➢ Production
➢ Health
➢ Economics
➢Data -> Knowledge -> Insight ➢ Marketing
➢Intelligent assistants
60

Why Machine Learning?
➢Problems for which it is difficult to formulate rules that cover all the conditions that we are expected to see, or that require a lot of fine-turning.
➢Complex problems where no good solutions exist, state-of- the-art Machine Learning techniques may be able to succeed.
➢Fluctuating environments: a Machine Learning system can adapt to new data.
➢Obtaining insights from large amount of data or complex problems.
61

Why Machine Learning?
➢A machine learning algorithm then takes “training data” and produces a model to generate the correct output
➢If done correctly the program will generalize to cases not observed…more on this later
➢Instead of writing programs by hand the focus shifts to collecting quality examples that highlight the correct output
62

What is Machine Learning?
63

What is Machine Learning?
64

What is Data Science?
➢Data Science
➢ Multidisciplinary
➢Digital revolution ➢Data-driven discovery
➢ Includes: —Data Mining
—Machine Learning —Big Data —Databases
—…
Source: , 2012
65
http://www.martinhilbert.net/

ML, AL, and Data Science over years
66

Types of Machine Learning Systems
It is useful to classify machine learning systems into broad categories based on the following criteria:
➢supervised, unsupervised, semi-supervised, and reinforcement learning
➢classification versus regression
➢online versus batch learning ➢instance-based versus model-based learning ➢parametric or nonparametric
67

Supervised/Unsupervised Learning
➢Machine Learning systems can be classified according to the amount and type of supervision they get during training.
➢ Supervised
➢k-Nearest Neighbours, Linear Regression, Logistic Regression,
Decision Trees, Neural Networks, and many more ➢ Unsupervised
➢K-Means, Principal Component Analysis ➢ Semi-supervised
➢Reinforcement Learning
68

Instance-Based/Model-Based Learning
➢Instance-Based: system learns the examples by heart, then generalizes to new cases by using a similarity/distance measure to compare them to the learned examples.
➢more details in week 3, an example in today’s lecture ➢Model-Based: build a model of these examples and
then use that model to make predictions. ➢more details in weeks 9 to 11
69

Challenges of Machine Learning
➢Insufficient Data
➢Quality Data
➢Representative Data
➢Irrelevant Features
➢Overfitting the Training Data
➢Underfitting the Training Data
➢Testing and Validation
➢Hyperparameter Tuning and Model Selection ➢Data Mismatch
➢Fairness, Societal Concerns
70

Part 3
K-nearest neighbour classifier
71

Nearest neighbour classifier
• Output is a class (here, blue or yellow)
• Instance-based learning, or lazy learning: computation only happens once called
• Flexible approach – no assumptions on data distribution
What class is this?
72

K- nearest neighbour classifier
K = 11
K =4
K =1
K =1
K =4
K = 11
73

How to compute distance?
K = 11
K =4
K =1
x:(0,0)
Euclidean distance
K =1
K =4
K = 11
74
y:(1,-5)

K- nearest neighbour classifier
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
K = 11
K =4
K =1
K =1
K =4
K = 11
75

K- nearest neighbour classifier
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
K = 11
K =4
K =1
K =1
K =4
K = 11
76

Important definitions
➢ Task : Flower classification ➢ Target (label, Class)
➢ Features
Iris setosa
Iris versicolor
Iris virginica
77

Important definitions
➢ Task : Flower classification
➢ Target (label, Class) : Setosa, Versicolor, Virginica. ➢ Features: Petal len, Petal wid, Sepal len, Sepal wid. ➢ Model
➢ Prediction
➢ Data point (sample)
➢ Dataset
Features
Prediction:
Versicolor
target
78

Next Time
➢ Reading assignment 1 Due – Sep. 13 at 21:00
➢ Read Pages 15-25 of Chapter 1 from “Introduction to Algorithms and Data Structures”, 4th Ed, by
. Dinneen, ’farb, and . Wilson, 2016 link
➢ Complete a quiz on Quercus and submit it by the deadline
➢ Week 1 Lab on Thursday Sep 9: Tutorial 0 ➢ Python Basics and GitHub
➢ Week 2 Lecture on Tuesday Sep 14 – Analysis of Algorithms ➢ Algorithms and Big O Notation
➢ Sorting ➢ Hashing
➢ Project 1 Due – Oct. 1 at 21:00
79