Foundations of Data Analytics and Machine Learning
Summer 2022
• Introductions
• CourseOverview
Copyright By PowCoder代写 加微信 powcoder
• End-to-EndMachineLearning
Introduction
Ø Instructor
ØOffice Hours: online by appointment
ØThe fastest and most effective means of communication: Piazza ØPlease prefix email subject with ‘APS1070’
Teaching Assistants
About me…
ØResearch: Hardware Acceleration for ML application
Computer Vision
Reliability Energy Privacy
Natural Language processing (NLP)
Recommender Systems
About me…
Computer Vision
ØResearch: Hardware Acceleration for ML application
Natural Language processing
Recommender Systems
Nvidia A100 GPU
Google Tensor SoC
Apple M1 Silicon
Google TPU
Hello! I’m BERT!
Understanding Search Queries Better Than Ever Before
Ref: https://www.blog.google/products/search/search-language-understanding-bert/
Hello! I’m BERT!
But I’m HUGE
Hello! I’m BERT!
But I’m HUGE
Hello! I’m BERT!
But I’m HUGE
Hello! I’m BERT!
Where the time and energy goes?
But I’m HUG
Baseline Inference
Model (BERT)
1x (baseline) (High-end GPUs,
Data Center)
8x Smaller W Only FP Comp
8x Smaller <1% Error
W+A Mokey INT Comp
Micro 2020
Efficient Execution
Model (BERT)
1x (baseline)
Hardware (High-end GPUs, Data Center)
Custom Model (GOBO)
21x Energy Efficient
Custom Hardware
8x Energy Efficient
About you...
ØWhat part of the world are you joining us from? ØWhat is your area of study?
ØPrevious experience?
ØWhy did you take this course?
ØWhat do you want to get out of this course?
Survey: What is your time zone?
Survey: Undergraduate Degree?
Survey: Undergraduate Studies in Toronto?
Survey: Why did you take this course?
ØTo become a Data Scientist (~44.7%)
ØFor fun (~36.2%)
ØTo become Machine Learning Researcher (~19%) Ø...
Survey: Programming Languages?
Survey: Rate Programming Abilities?
Survey: Python?
Why are you taking this course?
ØI always hear Machine Learning in my life, so I want to know what it is!
ØTo prepare myself for a ML related career path.
ØTo learn basics about ML and hopefully apply them to research
ØTo expand my horizons and eventually be able to use IoT in a manufacturing/maintenance environment
Ø“just learn some basic python skills”
ØFor MEng Certificate
ØFor pursuing courses in the field of Operations Research Ø“... Machine Learning is a hot topic...”
What kind of Projects would you like to do?
ØImage classification, Text classification, Speech classification, Text translation
ØApplication in Finance or Management
ØComputer vision related projects.
ØWeather forecasting
ØCustom microprocessor design for Machine Learning Ø“... I don't have any idea yet...”
Anything else you would like me to know?
ØI have very basic knowledge on Python and I'm very new to coding. I want to learn the concepts from scratch!
ØI'm working full-time in the summer. I'll try and join the occasional lecture, but I can only watch the recorded lectures most weeks.
Ø“I don't have access to windows, only mac!”
ØI work full time during the day (and sometimes do overnight or evening shifts), so having everything recorded and being able to ask additional questions over email would be really helpful for me!
ØIf you can provide resources on programming basics (e.g. YouTube videos or practice problems) that would be helpful.
Course Description
APS1070 is a prerequisite to the core courses in the Emphasis in Analytics. This course covers topics fundamental to data analytics and machine learning, including an introduction to Python and common packages, probability and statistics, matrix representations and fundamental linear algebra operations, basic algorithms and data structures and continuous optimization. The course is structured with both weekly lectures and tutorials/help sessions.
Primary Learning Outcomes
By the end of the course, students will be able to:
1. Describeandcontrastmachinelearningmodels, concepts and performance metrics
2. Performfundamentallinearalgebraoperations, and recognize their role in machine learning algorithms
3. Applymachinelearningmodelsandstatistical methods to datasets using Python and associated libraries.
Course Components
ØLectures: Tuesdays (10am-1pm) ØTutorials/Q&A Sessions: Thursdays (12-2pm) ØProjects: Four assignments
ØAny material covered in lectures / tutorials / Projects is fair game for the midterm and final assessments
APS1070 Course Information
ØCourse Website: http://q.utoronto.ca
ØDownload course materials
Ø Textbooks:
Ø “Mathematics for Machine Learning” by .
Deisenroth et al., 2020 (free)
Ø (optional) “The Elements of Statistical Learning”, 2nd Edition, by et al., 2009 (free)
Ø (optional) “Python for Data Analysis”, 2nd Edition, by Kinney, 2017
ØPiazza discussion board for tutorial, project and general questions
https://mml-book.github.io/
Grade Breakdown
Assessment Project 2
Weight (%)
Tentative Schedule
Due Jun 25 @ 11pm
Due Jul 23 @ 11pm
Due Aug 13 @ 11pm
Project 1 10
Due Jun 4 @ 11pm
Midterm Assessment 20
Distribution: Jul 5 @ 10:00am Force Collection: Jul 6 @ 10:00am
Final Assessment 30
Distribution: Aug 9 @ 10:00am Force Collection: Aug 10 @ 10:00am
Bonus Marks
ØIndividual: 3 students with most endorsed answers on Piazza, each receive 3% bonus.
ØClass: 1% bonus for all students if more than 70% of class participate in course evaluation.
Penalty for late Submissions
ØIt is the student’s responsibility to verify that projects and assessments are submitted on time
Ø Projects:
Ø-20% (of project maximum mark) if submitted within 72 hours past the deadline. ØA mark of zero will be given if the project is 72 hours late or more.
Ø Assessments:
ØFor every minute past your assessment window, a 10% deduction will be applied.
Ø Re-Grading
ØIf a student wishes to discuss marking for a Project or the Midterm, they should meet the marking a TA at the next available Q&A session.
ØCovers Week 1 to end of Week 6.
ØMight have Programming questions
ØMidterm tentatively scheduled for Tuesday, July 5th Ø90 minutes window (maximum) to finish the exam ØLate submissions will receive a -10%/min
ØAccess to all course notes – No Piazza
ØMore details will be provided 1 – 2 weeks before the midterm
Computer Requirements
You will need access to:
Øa computer to attend lectures (or view recordings) and practical sessions
Øa microphone and web camera to participate in lectures and Q&A Support Sessions
Øaccess to Piazza to post questions and participate in course discussions
Øaccess to Jupyter Notebook, preferably through Google Colab, to be able to complete the programming assignments
ØCreate a Google and GitHub account ASAP!
ØNumPy, Matplotlib, Pandas and many more
ØGoogle Colaboratory ØJupyter notebook in the cloud Øno installation required!
All project handouts will be Jupyter notebooks
Why Python for Data Analysis?
ØVery popular interpreted programming language
ØWrite scripts (use an interpreter)
ØLarge, active scientific computing and data analysis community
ØPowerful data science libraries – NumPy, pandas, matplotlib, scikit-learn
ØOpen-source, active community ØEncourages logical, clear code ØInvented by Rossum
Course Philosophy
ØTop-down approach
ØLearn by doing
ØExplain the motivations first ØMathematical details second
ØFocus on implementation skills
ØConnect concepts using the theme of End-to-end-
Machine Learning
My goal is to have everyone leave the course with a strong understanding of the mathematical and programming fundamentals necessary for future courses.
Tentative Schedule (Weeks 1 – 7)
Week Lecture
1 Introduction
2 Data Exploration
3 Fundamentals of Learn.
4 Measuring Uncertainty
5 Data Processing
6 Dim. Reduction Part 1
7 Dim. Reduction Part 2
T1- Basic Data Science Q/A Support Session Q/A Support Session T2– Anomaly Detection Q/A Support Session Q/A Support Session
N/A Project 1 Project 1 Project 1 Project 2 Project 2 Project 2
Tentative Schedule (Weeks 8 – 14)
Week Lecture
8 Reading Week
9 Midterm Assessment
10 Linear Regression
11 Logistic Regression
12 Deep Learning
14 Final Assessment
Q/A Support Session Q/A Support Session T4- Linear Regression Q/A Support Session Q/A Support Session
N/A Project 3 Project 3 Project 3 Project 4 Project 4 Project 4
Why are we here?
Why are we here?
• Ideally:
• We want computers to do everything for us
• Problem:
• How should we program them?
Why Machine Learning?
Q: How can we solve a programming problem?
ØEvaluate all conditions and write a set of rules that efficiently address those conditions
Øex. robot to navigate maze, turn towards opening Q: How could we write a set of rules to
determine if a goat is in the image?
Requires systems that can learn from examples...
Examples of Applications
ØDetecting suspicious transactions from credit card purchases ØAnalyzing images of products on a production line to automatically
classify them
ØDetecting mental health issues from brain scans
ØForecasting company revenue based on current performance metrics
ØVisualizing complex, high-dimensional data in an insightful diagram ØGrouping clients based on their purchasing history to design a
customized marketing strategy for each group ØIntelligent assistant AI for real-time strategy games
Why Machine Learning?
ØProblems for which it is difficult to formulate rules that cover all the conditions that we are expected to see, or that require a lot of fine-turning.
ØComplex problems where no good solutions exist, state-of- the-art Machine Learning techniques may be able to succeed.
ØFluctuating environments: a Machine Learning system can adapt to new data.
ØObtaining insights from large amount of data or complex problems.
Why Machine Learning?
ØA machine learning algorithm then takes “training data” and produces a model to generate the correct output
ØIf done correctly the program will generalize to cases not observed...more on this later
ØInstead of writing programs by hand the focus shifts to collecting quality examples that highlight the correct output
What is Machine Learning?
A branch of artificial intelligence concerned with design and development of algorithms to build mathematical models capable of learning from sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.
What is Machine Learning?
Deep learning is a
subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled.
What is Data Science?
ØData Science is multidisciplinary and all-encompassing field.
Ø Includes:
—Data Mining —Machine Learning —Big Data —Databases
Source: , 2012
Important definitions
Ø Task : Flower classification Ø Target (label, Class)
Ø Features
Iris setosa
Iris versicolor
Iris virginica
Important definitions
Ø Task : Flower classification
Ø Target (label, Class) : Setosa, Versicolor, Virginica. Ø Features: Petal len, Petal wid, Sepal len, Sepal wid. Ø Model
Ø Prediction
Ø Data point (sample)
Prediction:
Versicolor
Types of Machine Learning Systems
It is useful to classify machine learning systems into broad categories based on the following criteria:
ØSupervised, unsupervised, and reinforcement learning ØInstance-based versus model-based learning ØClassification versus regression
Supervised/Unsupervised Learning
ØMachine Learning systems can be classified according to the amount and type of supervision they get during training.
Ø Supervised
Øk-Nearest Neighbours, Linear Regression, Logistic Regression,
Decision Trees, Neural Networks, and many more Ø Unsupervised
ØK-Means, Principal Component Analysis ØReinforcement Learning
ØReward for a series of action
Ø Solve the maze to get the banana!
Reinforcement learning
What is the difference between Supervised and Reinforcement learning?
https://www.youtube.com/watch?v=gn4nRCC9TwQ
Instance-Based/Model-Based Learning
ØInstance-Based: system learns the examples by heart, then generalizes to new cases by using a similarity/distance measure to compare them to the learned examples.
Ømore details in week 3
ØModel-Based: build a model of these examples and then use that model to make predictions.
Ømore details in weeks 9 to 11
Classification vs. Regression
ØClassification: Discrete target
ØSeparate the Dataset ØApples or oranges?
ØIris classification ØHandwritten digit recognition
Feature # 1
Feature # 2
Classification vs. Regression
ØClassification: Discrete target
ØSeparate the Dataset ØApples or oranges?
ØIris classification ØHandwritten digit recognition
ØRegression: Continues Target
ØFit the dataset
ØPrice of a house ØRevenue of a company ØAge of a tree
Feature # 1
Feature # 1
Feature # 2
Challenges of Machine Learning
ØInsufficient Data
ØQuality Data
ØRepresentative Data
ØIrrelevant Features
ØOverfitting the Training Data
ØUnderfitting the Training Data
ØTesting and Validation
ØHyperparameter Tuning and Model Selection ØData Mismatch
Course Theme
End-to-End Machine Learning:
1. Understand the problem
2. Retrieve the data
3. Explore and visualize the data to gain insights
4. Prepare the data for the algorithm/model
5. Select and train the algorithm/model
6. Fine-tune your algorithm/model
7. Present your solution
8. Launch, monitor, and maintain your system
End-to-End Machine Learning
Understand Problem
Data Visualization
Model Selection
Data Collection
Data Preparation
Model Training
Understand the Problem
ØOften, we need to make some sort of decisions (predictions)
ØTwo common types of decisions that we make are:
Ø Classification
Ø Discrete number of possibilities
Ø Regression
Ø Continuous number of real-valued possibilities
Supervised
Unsupervised
classification
clustering
regression
dimensionality reduction
Continuous Discrete
Understand the Problem
Input data is represented by features that can come in many forms:
ØRaw pixels
Ø Histograms ØTabular data Ø Spectrograms Ø...
Data Exploration
ØUnderstand your data through visualization
ØAssess the difficulty of the problem ØYou have a data set D = {(x(i),y(i))}
ØYou want to learn y = f(x) from D Ømore precisely, you want to minimize
error in predictions
ØWhat kind of model (algorithm) do you need?
MNIST low-dimensional projection
Model Selection
Many classifiers to choose from
ØSupport-Vector Machine (SVM) ØLogistic Regression
ØRandom Forests
ØNaive Bayes
ØBayesian network ØK-Nearest Neighbour Ø(Deep) Neural networks Ø Etc.
Model Selection
ØOften the easiest algorithm to implement is k-Nearest Neighbours
ØMatch to similar data using a distance metric
Ømore details provided in week 3
Q: What happens as we increase #data?
Q: What about as #data approaches infinity?
ØUnlike us, computers have no trouble with memorization.
ØThe real question is, how well does our algorithm make predictions on new data?
ØWe need a way to measure how well our algorithm (model) generalizes to new, never before seen, data.
Regression Example
ØLet’s look at a more concrete example...
ØGiven noisy sample data (blue), we want to find the polynomial that generated the data
media exposure
stock price
Mean Squared Error
ØNeed to first define our error term, in this case we can use the mean squared error (MSE):
ØError is measured by finding the squared error in the prediction of y(x) from x.
Ø The error for the red polynomial is the sum of the squared vertical errors
Fitting the Data
Q: Which polynomial fits the data best?
Ø based on error term? Ø based on test data?
Overfitting vs Underfitting
High training error Acceptable training error Perfect training error (zero) and High test error (Underfit) and test error and high test error (Overfit)
Generalization
ØGiving the model a greater capacity (more complexity) to fit the data... does not necessarily help
ØHow do we evaluate the model performance?
Verify model on
Overfitting
ØIn brief: fitting characteristics of training data that do not generalize to future test data
ØCentral problem in machine learning
ØParticularly problematic if #data << #parameters Ø... don’t have enough data to “identify” parameters
Generalization
ØMachine learning is a game of balance, with our objective being to generalize to all possible future data
Under-fitting
Model Capacity (Complexity)
New samples
Over-fitting
Training samples
Error (% Incorrect)
Bias-Variance Trade-off
ØModels with too few parameters are inaccurate because of a large bias (not enough flexibility).
ØModels with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample).
Inductive Bias
Inductive (Oxford Dict.): characterized by the inference of general laws from particular instances
Ø Let’s avoid making assumptions about the model (polynomial order) ØAssume for simplicity that D = {(x(i),y(i))} is noise free
Øx(i)’s in D only cover small subset of input space x
ØQ: What’s the best we can do?
ØIf we’ve seen x=x(i) report y=y(i)
ØIf we have not seen x= x(i), can’t say anything (no assumptions)
ØThis is called rote learning... boring, eh?
Ø Key idea: you can't generalize to unseen data w/o assumptions!
ØThus, key to ML is generalization
Ø To generalize, ML algorithm must have some inductive bias
Ø Bias usually in the form of a restricted model (hypothesis) space
Ø Important to understand restrictions (and whether appropriate) 74
Inductive Bias
Ø Example: Nearest neighbors
Assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood.
This is the bias used in the k-nearest neighbors algorithm.
The assumption is that cases that are near each other tend to belong to the same class.
Training and Testing Data
ØTrack generalization error by splitting data into training and testing
Ø80% training and 20% testing ØMore data = better model
ØWould like to use all our data for training, however we need some way to validate our results
The problem with tracking test accuracy
ØIf we track test loss/accuracy in our training curve, then:
ØWe may make decisions about model architecture using the test accuracy!
ØWhat K should be?
ØThe final test accuracy will not be a realistic estimate of how our model will perform on a new data set!
Validation Set
ØWe still want to track the loss/accuracy on a data set not used for training
ØIdea: set aside a separate data set, called the validation set ØTrack validation accuracy in the training curve
ØMake decisions about model architecture using the validation set
Validation Set
ØWe still want to track the loss/accuracy on a data set not used for training ØIdea: set aside a separate data set, called the validation set
ØTrack validation accuracy in the training curve
ØMake decisions about model architecture using the validation set
K is a hyperparameter.
We tune hyperparameters using the validation set
Validation and Holdout Data
ØTraining, Validation and Testing Data ØLess data for your training m
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com