CS代考 APS1070 in subject line) is fine if you have a reason for not using Piazza

Foundations of Data Analytics and Machine Learning
Lecture 1:
• Introduction
• CourseOverview

Copyright By PowCoder代写 加微信 powcoder

• Machine Learning Overview
• K-nearestNeighbourClassifier

Instruction Team
Instructor: Prof.
Head-TA: Zadeh
TA: TA: Haoyan (Max) A: TA:
Get to know the instruction team: https://q.utoronto.ca/courses/223861/pages/course-contacts

Communication
➢Preferred contact method for a quick response: Piazza;
1. Via Piazza Post to the “Entire Class”
2. Via Piazza Question using Post to “Instructor(s)” – Type the specific person’s name from the list
or type “instructors” to include us all
➢Communication via email (APS1070 in subject line) is fine if you have a reason for not using Piazza for that question.

Instructor: Prof. Head-TA: Zadeh TA:
TA: Haoyan (Max) A:

➢Please prefix email subject with ‘APS1070’

A little bit about your instructor …
I worked for a
Then, I worked
while here as a visiting researcher.
And, I am a new professor here at the UofT.
here as a research scientist.
Industrial
Engineering
I was born here.
I studied worked
Computer here as a
Science data
X scientist.

Traditional Land Acknowledgement
I wish to acknowledge the Indigenous Peoples of all the lands which we call home including the land on which the University of Toronto operates.
For thousands of years, it has been the traditional land of the Huron- Wendat, the Seneca, and the Mississaugas of the Credit.
Today, this meeting place is still the home to many Indigenous people from across Turtle Island and I am grateful to have the opportunity to work on this land.

Native Land

Balanced network

In a balanced network:
Enemy of an enemy = friend Enemy of a friend = enemy Friend of an enemy = enemy Friend of a friend = friend

Balanced network
In a balanced network:
Enemy of an enemy = friend Enemy of a friend = enemy Friend of an enemy = enemy Friend of a friend = friend

Meet Mike, a friend of and .

Here is George, another friend of .
does not really know George that well.
knows George and they hate each other!

Unbalanced network
Enemy of an enemy = friend Enemy of a friend = enemy Friend of an enemy = enemy Friend of a friend = friend

BalanUcendbasluabngcreadph
It is 1 edge away from balance.

Community detection in social networks
-positive relationship (green edge) -negative relationship (red edge)
Nodes: people
Edges: positive or negative ties
Social Networks

216= 65536 possible communities
Social Networks

218= 262144 possible communities
Social Networks

Biological Network of a Protein
2329= 1 duotrigintillion !

Baker’s yeast
Nodes: biological molecules
Edges: activation or inhibition relations

Financial portfolios
Nodes: securities (investments)
Edges: positive or negative correlations (of returns) Network: portfolio

US Senate over time
Node colors=party affiliation:
Republican
Edge colors:
Collaboration
Political Science

Bipartivity of fullerenes
Nodes: carbon atoms
Edges: atomic bonds (considered negative) Network: molecule

Atomic magnets
Physical context

International Relations (1946-1996)
International Relations
Nodes: countries
Edges: international relations

About you…
➢What part of the world are you joining us from? ➢What is your area of study?
➢Previous experience?
➢Why did you take this course?
➢What do you want to get out of this course?

Survey: What is your time zone?
26.2% 0.8% 2.4% 61.9% 0.0% 4.0% 4.8%

Survey: Undergraduate Degree?

Survey: Undergraduate Studies in Toronto?
47.2% No 52.8% Yes

Survey: Why did you take this course?
➢To become Machine Learning Researcher (~20.2%) ➢To become a Data Scientist (~74.2%)
➢Starting a Machine Learning Startup (~19.4%) ➢For fun (~8.1%)

Survey: Programming Languages?

Survey: Rate Programming Abilities?

Why are you taking this course?

What kind of Projects would you like to do?

Course Overview

Course Description
APS1070 is a prerequisite to the core courses in the Emphasis in Analytics. This course covers topics fundamental to data analytics and machine learning, including an introduction to Python and common packages, analysis of algorithms, probability and statistics, matrix representations and fundamental linear algebra operations, basic algorithms and data structures and continuous optimization. The course is structured with both weekly lectures and tutorials/Q&A sessions.

Primary Learning Outcomes
By the end of the course, students will be able to:
1. Describeandcontrastmachinelearningmodels, concepts and performance metrics
2. Analyze the complexity of algorithms and common abstract data types.
3. Performfundamentallinearalgebraoperations,and recognize their role in machine learning algorithms
4. Applymachinelearningmodelsandstatistical methods to datasets using Python and associated libraries.

Course Components
➢Lectures: Tuesdays (3 hrs)
➢Tutorials/Q&A Sessions: Thursdays (2 hrs)
➢Four projects (submitted via GitHub Classroom)
➢Eight tasks/quizzes for reading assignments (submitted via Quercus)
➢Any material covered in lectures / tutorials / readings / projects / Piazza is fair game for the midterm and final assessments.

APS1070 Course Information
➢Course Website: http://q.utoronto.ca ➢Access course materials and Zoom Sessions
➢Verify using UTORid ➢ Textbooks:
➢ “Mathematics for Machine Learning” by . Deisenroth et al., 2020 (free)
➢ “The Elements of Statistical Learning”, 2nd Edition, by et al., 2009 (free)
➢ “Introduction to Algorithms and Data Structures”, 4th Ed, by . Dinneen, ’farb, and . Wilson, 2016 (free)
➢ Piazza discussion board for tutorial, project and almost all questions and communications.

Computer Requirements and Online Tools
You will need access to:
➢A computer equipped with microphone and webcam (and ideally 2 screens) to attend and participate in lectures (or view recordings) and practical sessions on Zoom
➢Jupyter Notebook, preferably through Google Colab, to be able to complete the projects
➢GitHub for submitting projects
➢Quercus for submitting reading assignment tasks/quizzes and course announcements

Computer Requirements and Online Tools
You will need access to:
➢Piazza to ask questions, communicate with the teaching team, and participate in course discussions
➢Top 10 endorsed answerers (across the two sections of APS1070 for the Fall 2021) on Piazza with at least 3 endorsed answers get 2 points added to their final course grade.
➢Questions in the general forms of “is this the correct answer?” or “what is wrong with my code?” or “why my code does not compile?” and the like will not receive a response.

Grade Breakdown
Projects/Quizzes
Weight (%)
Tentative Schedule
Eight tasks/quizzes for reading assignments
Due on Mondays at 21:00 (for weeks 2,3,4,5 and 7,8,9,10 as per course schedule)
Due Oct 1 at 21:00
Midterm Assessment
Oct 18 at 9:00 to Oct 19 at 21:00
(limited 2-hour window to start the exam and submit it)
Due Oct 22 at 21:00
Due Nov 5 at 21:00
Due Nov 26 at 21:00
Final Assessment*
Dec 9 at 9:00 to Dec 10 at 21:00
(limited 3-hour window to start the exam and submit it)

Penalty for late Submissions
Quercus/GitHub submission time will be used. Late projects and reading assignments will incur a penalty as follows:
➢ -30% (of project maximum mark) if submitted within 72 hours past
the deadline.
➢ A mark of zero will be given if the submission is 72 hours late or
Late midterm and exam submissions get a grade of 0.

Class Representatives
If you have any complaints or suggestions about this course, please talk to me or email me directly. Alternatively, talk to one of the class reps who will then talk to me and the teaching team.
We need 2 class reps per section. Volunteers can send me an email (with “APS1070 class rep” in subject line) by 14 September.
Class reps are asked to keep in touch with the instruction team about any feedback they receive from students and to attend two staff-student meetings over the course of this semester.
This can be a great opportunity to develop your leadership skills.

Academic Integrity
➢All the work you submit must be your own and no part of your submitted work should be prepared by someone else. Plagiarism or any other form of cheating in examinations, tests, assignments, or projects, is subject to serious academic penalty (e.g., suspension or expulsion from the faculty or university).
➢ A person who supplies an assignment or project to be copied will be penalized in the same way as the one who makes the copy.
➢Several plagiarism detection tools will be used to assist in the evaluation of the originality of the submitted work for both text and code. They are quite sophisticated and difficult to defeat.

➢NumPy, Matplotlib, Pandas and many more
➢Google Colaboratory ➢Jupyter notebook in the cloud
➢no installation required ➢requires Google Drive
All project handouts will be Jupyter notebooks

Why Python for Data Analysis?
➢Very popular interpreted programming language
➢Write scripts (use an interpreter)
➢Large and active scientific computing and data analysis community
➢Powerful data science libraries – NumPy, pandas, matplotlib, scikit-learn, dask
➢Open-source, active community ➢Encourages logical and clear code ➢Invented by Rossum

Course Philosophy
➢Top-down approach ➢Learn by doing
➢Explain the motivations first ➢Mathematical details second
➢Focus on implementation skills
➢Connect concepts using the theme of End-to-End Machine Learning
My goal is to have everyone leave the course with a strong understanding of the mathematical and programming fundamentals necessary for future courses.

Tentative Schedule (Weeks 1 – 3)
Time (morning section 0101)
Time (evening section 0201)
9:00-12:00
17:00-20:00
Introduction
Course Overview, Machine Learning Overview, K-Nearest Neighbours
9:00-11:00
17:00-19:00
Tutorial 0 – Python Basics and GitHub
Reading assignment 1 Due – Sep. 13 at 21:00
9:00-12:00
17:00-20:00
Algorithms and Data Structures
Analysis of Algorithms, Asymptotic Notation, Sorting, Dictionary ADT, Hashing
9:00-11:00
17:00-19:00
Tutorial 1 – Basic Data Science
Reading assignment 2 Due – Sep. 20 at 21:00
9:00-12:00
17:00-20:00
Data Exploration, Making Predictions, Foundations of Learning
End-to-End Machine Learning, Data Wrangling, Plotting and Visualization, Decisions Trees
9:00-11:00
17:00-19:00
Q/A Support Session
Computer Science and Programming

Tentative Schedule (Weeks 4-6 and reading week)
Reading assignment 3 Due – Sep. 27 at 21:00
9:00-12:00
17:00-20:00
Measuring Uncertainty and Evaluating Performance
K-Means Clustering, Probability Theory, Multivariate Gaussians, Performance
9:00-11:00
17:00-19:00
Q/A Support Session
Project 1 Due – Oct. 1 at 21:00
Reading assignment 4 Due – Oct. 4 at 21:00
9:00-12:00
17:00-20:00
Mathematical Foundation of Data Processing
Linear Algebra, Analytical Geometry and Transformations, Data Augmentation
9:00-11:00
17:00-19:00
Tutorial 2 – Anomaly Detection
Reading Week
Oct 11: Thanksgiving Extra Q/A Sessions on Thur. Oct 14 9:00-11:00 and 17:00-19:00
No lecture on 19 Oct. No office hours on 20 Oct.
Midterm Assessment: Oct 18 at 9:00 to Oct 19 at 21:00 (limited 2-hour window to start the exam and submit it)
9:00-11:00
17:00-19:00
Q/A Support Session
Project 2 Due – Oct 22 at 21:00
Mathematical Foundations

➢Consists of multiple choice, short answer, math and programming questions as well as analytical and reasoning questions
➢Cover all material before the midterm
➢Late midterm submissions will receive a grade of 0
➢Access to all course materials (except for Piazza)
➢More details will be provided 1 – 2 weeks before the midterm

Tentative Schedule (Weeks 7 – 9)
Reading assignment 5 Due – Oct. 25 at 21:00
9:00-12:00
17:00-20:00
Dimensionty Reduction Part 1
Projection, Matrix Decomposition, Eigenvectors, Principal Component Analysis
9:00-11:00
17:00-19:00
Tutorial 3 – PCA
Reading assignment 6 Due – Nov. 1 at 21:00
9:00-12:00
17:00-20:00
Dimensionty Reduction Part 2
Singular Value Decomposition, Feature Interpretation, Vector Calculus
9:00-11:00
17:00-19:00
Q/A Support Session
Project 3 Due – Sat. Nov. 5 at 21:00
Reading assignment 7 Due – Nov. 8 at 21:00
9:00-12:00
17:00-20:00
Generalized Linear Model
Linear Regression, Gradient Descent, Polynomial Regression, Regularization
9:00-11:00
17:00-19:00
Remembrance Day – no lab sessions
Neural Networks
Mathematical Foundations

Tentative Schedule (Weeks 10 – 13)
Reading assignment 8 Due – Nov. 15 at 21:00
9:00-12:00
17:00-20:00
Artificial Neural Networks
Continuous Optimization, Convexity, Classification, Perceptron, Neural Networks
9:00-11:00
17:00-19:00
Tutorial 4 – Linear Regression
9:00-12:00
17:00-20:00
Deep Learning
Backward propagation, Deep Learning, Transfer Learning, Discrete Optimization
9:00-11:00
17:00-19:00
Q/A Support Session
Project 4 Due – Nov. 26 at 21:00
9:00-12:00
17:00-20:00
Course Review
9:00-11:00
17:00-19:00
Q/A Support Session
No lectures, no office hours, no lab sessions
Final Assessment: Dec 9 at 9:00 to Dec 10 at 21:00 (limited 3-hour window to start the exam and submit it)
Review Neural Networks

Slide Attribution
The slides used for APS1070 contain materials from various sources. Special thanks to the following authors:
• Sinisa Colic
• . Wilson (Lecture 2 in particular)

Machine Learning Overview

Why are we here?
➢ Machine Learning skills is in demand
➢ Neural Networks are evolving and leading to new performance limits ➢ Deep Neural Networks are at the cutting-edge of applied computing

Rising Tide of AI Capacity
Source: Hans-Moravecs-rising-tide-of-AI-capacity
➢ Jobs requiring
creativity seems to be a safe career choice…
➢ Programming and AI design seem safe

Why Machine Learning?
Q: How can we solve a programming problem?
➢Evaluate all conditions and write a set of rules that efficiently address those conditions
➢ex. robot to navigate maze, turn towards opening Q: How could we write a set of rules to
determine if a goat is in the image?
Requires systems that can learn from examples…

Examples of Applications
➢Finance and banking ➢ Production
➢ Economics
➢Data -> Knowledge -> Insight ➢ Marketing
➢Intelligent assistants

Why Machine Learning?
➢Problems for which it is difficult to formulate rules that cover all the conditions that we are expected to see, or that require a lot of fine-turning.
➢Complex problems where no good solutions exist, state-of- the-art Machine Learning techniques may be able to succeed.
➢Fluctuating environments: a Machine Learning system can adapt to new data.
➢Obtaining insights from large amount of data or complex problems.

Why Machine Learning?
➢A machine learning algorithm then takes “training data” and produces a model to generate the correct output
➢If done correctly the program will generalize to cases not observed…more on this later
➢Instead of writing programs by hand the focus shifts to collecting quality examples that highlight the correct output

What is Machine Learning?

What is Machine Learning?

What is Data Science?
➢Data Science
➢ Multidisciplinary
➢Digital revolution ➢Data-driven discovery
➢ Includes: —Data Mining
—Machine Learning —Big Data —Databases
Source: , 2012
http://www.martinhilbert.net/

ML, AL, and Data Science over years

Types of Machine Learning Systems
It is useful to classify machine learning systems into broad categories based on the following criteria:
➢supervised, unsupervised, semi-supervised, and reinforcement learning
➢classification versus regression
➢online versus batch learning ➢instance-based versus model-based learning ➢parametric or nonparametric

Supervised/Unsupervised Learning
➢Machine Learning systems can be classified according to the amount and type of supervision they get during training.
➢ Supervised
➢k-Nearest Neighbours, Linear Regression, Logistic Regression,
Decision Trees, Neural Networks, and many more ➢ Unsupervised
➢K-Means, Principal Component Analysis ➢ Semi-supervised
➢Reinforcement Learning

Instance-Based/Model-Based Learning
➢Instance-Based: system learns the examples by heart, then generalizes to new cases by using a similarity/distance measure to compare them to the learned examples.
➢more details in week 3, an example in today’s lecture ➢Model-Based: build a model of these examples and
then use that model to make predictions. ➢more details in weeks 9 to 11

Challenges of Machine Learning
➢Insufficient Data
➢Quality Data
➢Representative Data
➢Irrelevant Features
➢Overfitting the Training Data
➢Underfitting the Training Data
➢Testing and Validation
➢Hyperparameter Tuning and Model Selection ➢Data Mismatch
➢Fairness, Societal Concerns

K-nearest neighbour classifier

Nearest neighbour classifier
• Output is a class (here, blue or yellow)
• Instance-based learning, or lazy learning: computation only happens once called
• Flexible approach – no assumptions on data distribution
What class is this?

K- nearest neighbour classifier

How to compute distance?
Euclidean distance

K- nearest neighbour classifier
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

K- nearest neighbour classifier
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

Important definitions
➢ Task : Flower classification ➢ Target (label, Class)
➢ Features
Iris setosa
Iris versicolor
Iris virginica

Important definitions
➢ Task : Flower classification
➢ Target (label, Class) : Setosa, Versicolor, Virginica. ➢ Features: Petal len, Petal wid, Sepal len, Sepal wid. ➢ Model
➢ Prediction
➢ Data point (sample)
Prediction:
Versicolor

➢ Reading assignment 1 Due – Sep. 13 at 21:00
➢ Read Pages 15-25 of Chapter 1 from “Introduction to Algorithms and Data Structures”, 4th Ed, by
. Dinneen, ’farb, and . Wilson, 2016 link
➢ Complete a quiz on Quercus and submit it by the deadline
➢ Week 1 Lab on Thursday Sep 9: Tutorial 0 ➢ Python Basics and GitHub
➢ Week 2 Lecture on Tuesday Se

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com