CS代考 APS1070’

Foundations of Data Analytics and Machine Learning
Summer 2022
• Introductions
• CourseOverview

Copyright By PowCoder代写 加微信 powcoder

• End-to-EndMachineLearning

Introduction
Ø Instructor
ØOffice Hours: online by appointment
ØThe fastest and most effective means of communication: Piazza ØPlease prefix email subject with ‘APS1070’

Teaching Assistants

About me…
ØResearch: Hardware Acceleration for ML application
Computer Vision
Reliability Energy Privacy
Natural Language processing (NLP)
Recommender Systems

About me…
Computer Vision
ØResearch: Hardware Acceleration for ML application
Natural Language processing
Recommender Systems
Nvidia A100 GPU
Google Tensor SoC
Apple M1 Silicon
Google TPU

Hello! I’m BERT!
Understanding Search Queries Better Than Ever Before
Ref: https://www.blog.google/products/search/search-language-understanding-bert/

Hello! I’m BERT!
But I’m HUGE

Hello! I’m BERT!
But I’m HUGE

Hello! I’m BERT!
But I’m HUGE

Hello! I’m BERT!
Where the time and energy goes?
But I’m HUG

Baseline Inference
Model (BERT)
1x (baseline) (High-end GPUs,
Data Center)

8x Smaller W Only FP Comp
8x Smaller <1% Error W+A Mokey INT Comp Micro 2020 Efficient Execution Model (BERT) 1x (baseline) Hardware (High-end GPUs, Data Center) Custom Model (GOBO) 21x Energy Efficient Custom Hardware 8x Energy Efficient About you... ØWhat part of the world are you joining us from? ØWhat is your area of study? ØPrevious experience? ØWhy did you take this course? ØWhat do you want to get out of this course? Survey: What is your time zone? Survey: Undergraduate Degree? Survey: Undergraduate Studies in Toronto? Survey: Why did you take this course? ØTo become a Data Scientist (~44.7%) ØFor fun (~36.2%) ØTo become Machine Learning Researcher (~19%) Ø... Survey: Programming Languages? Survey: Rate Programming Abilities? Survey: Python? Why are you taking this course? ØI always hear Machine Learning in my life, so I want to know what it is! ØTo prepare myself for a ML related career path. ØTo learn basics about ML and hopefully apply them to research ØTo expand my horizons and eventually be able to use IoT in a manufacturing/maintenance environment Ø“just learn some basic python skills” ØFor MEng Certificate ØFor pursuing courses in the field of Operations Research Ø“... Machine Learning is a hot topic...” What kind of Projects would you like to do? ØImage classification, Text classification, Speech classification, Text translation ØApplication in Finance or Management ØComputer vision related projects. ØWeather forecasting ØCustom microprocessor design for Machine Learning Ø“... I don't have any idea yet...” Anything else you would like me to know? ØI have very basic knowledge on Python and I'm very new to coding. I want to learn the concepts from scratch! ØI'm working full-time in the summer. I'll try and join the occasional lecture, but I can only watch the recorded lectures most weeks. Ø“I don't have access to windows, only mac!” ØI work full time during the day (and sometimes do overnight or evening shifts), so having everything recorded and being able to ask additional questions over email would be really helpful for me! ØIf you can provide resources on programming basics (e.g. YouTube videos or practice problems) that would be helpful. Course Description APS1070 is a prerequisite to the core courses in the Emphasis in Analytics. This course covers topics fundamental to data analytics and machine learning, including an introduction to Python and common packages, probability and statistics, matrix representations and fundamental linear algebra operations, basic algorithms and data structures and continuous optimization. The course is structured with both weekly lectures and tutorials/help sessions. Primary Learning Outcomes By the end of the course, students will be able to: 1. Describeandcontrastmachinelearningmodels, concepts and performance metrics 2. Performfundamentallinearalgebraoperations, and recognize their role in machine learning algorithms 3. Applymachinelearningmodelsandstatistical methods to datasets using Python and associated libraries. Course Components ØLectures: Tuesdays (10am-1pm) ØTutorials/Q&A Sessions: Thursdays (12-2pm) ØProjects: Four assignments ØAny material covered in lectures / tutorials / Projects is fair game for the midterm and final assessments APS1070 Course Information ØCourse Website: http://q.utoronto.ca ØDownload course materials Ø Textbooks: Ø “Mathematics for Machine Learning” by . Deisenroth et al., 2020 (free) Ø (optional) “The Elements of Statistical Learning”, 2nd Edition, by et al., 2009 (free) Ø (optional) “Python for Data Analysis”, 2nd Edition, by Kinney, 2017 ØPiazza discussion board for tutorial, project and general questions https://mml-book.github.io/ Grade Breakdown Assessment Project 2 Weight (%) Tentative Schedule Due Jun 25 @ 11pm Due Jul 23 @ 11pm Due Aug 13 @ 11pm Project 1 10 Due Jun 4 @ 11pm Midterm Assessment 20 Distribution: Jul 5 @ 10:00am Force Collection: Jul 6 @ 10:00am Final Assessment 30 Distribution: Aug 9 @ 10:00am Force Collection: Aug 10 @ 10:00am Bonus Marks ØIndividual: 3 students with most endorsed answers on Piazza, each receive 3% bonus. ØClass: 1% bonus for all students if more than 70% of class participate in course evaluation. Penalty for late Submissions ØIt is the student’s responsibility to verify that projects and assessments are submitted on time Ø Projects: Ø-20% (of project maximum mark) if submitted within 72 hours past the deadline. ØA mark of zero will be given if the project is 72 hours late or more. Ø Assessments: ØFor every minute past your assessment window, a 10% deduction will be applied. Ø Re-Grading ØIf a student wishes to discuss marking for a Project or the Midterm, they should meet the marking a TA at the next available Q&A session. ØCovers Week 1 to end of Week 6. ØMight have Programming questions ØMidterm tentatively scheduled for Tuesday, July 5th Ø90 minutes window (maximum) to finish the exam ØLate submissions will receive a -10%/min ØAccess to all course notes – No Piazza ØMore details will be provided 1 – 2 weeks before the midterm Computer Requirements You will need access to: Øa computer to attend lectures (or view recordings) and practical sessions Øa microphone and web camera to participate in lectures and Q&A Support Sessions Øaccess to Piazza to post questions and participate in course discussions Øaccess to Jupyter Notebook, preferably through Google Colab, to be able to complete the programming assignments ØCreate a Google and GitHub account ASAP! ØNumPy, Matplotlib, Pandas and many more ØGoogle Colaboratory ØJupyter notebook in the cloud Øno installation required! All project handouts will be Jupyter notebooks Why Python for Data Analysis? ØVery popular interpreted programming language ØWrite scripts (use an interpreter) ØLarge, active scientific computing and data analysis community ØPowerful data science libraries – NumPy, pandas, matplotlib, scikit-learn ØOpen-source, active community ØEncourages logical, clear code ØInvented by Rossum Course Philosophy ØTop-down approach ØLearn by doing ØExplain the motivations first ØMathematical details second ØFocus on implementation skills ØConnect concepts using the theme of End-to-end- Machine Learning My goal is to have everyone leave the course with a strong understanding of the mathematical and programming fundamentals necessary for future courses. Tentative Schedule (Weeks 1 – 7) Week Lecture 1 Introduction 2 Data Exploration 3 Fundamentals of Learn. 4 Measuring Uncertainty 5 Data Processing 6 Dim. Reduction Part 1 7 Dim. Reduction Part 2 T1- Basic Data Science Q/A Support Session Q/A Support Session T2– Anomaly Detection Q/A Support Session Q/A Support Session N/A Project 1 Project 1 Project 1 Project 2 Project 2 Project 2 Tentative Schedule (Weeks 8 – 14) Week Lecture 8 Reading Week 9 Midterm Assessment 10 Linear Regression 11 Logistic Regression 12 Deep Learning 14 Final Assessment Q/A Support Session Q/A Support Session T4- Linear Regression Q/A Support Session Q/A Support Session N/A Project 3 Project 3 Project 3 Project 4 Project 4 Project 4 Why are we here? Why are we here? • Ideally: • We want computers to do everything for us • Problem: • How should we program them? Why Machine Learning? Q: How can we solve a programming problem? ØEvaluate all conditions and write a set of rules that efficiently address those conditions Øex. robot to navigate maze, turn towards opening Q: How could we write a set of rules to determine if a goat is in the image? Requires systems that can learn from examples... Examples of Applications ØDetecting suspicious transactions from credit card purchases ØAnalyzing images of products on a production line to automatically classify them ØDetecting mental health issues from brain scans ØForecasting company revenue based on current performance metrics ØVisualizing complex, high-dimensional data in an insightful diagram ØGrouping clients based on their purchasing history to design a customized marketing strategy for each group ØIntelligent assistant AI for real-time strategy games Why Machine Learning? ØProblems for which it is difficult to formulate rules that cover all the conditions that we are expected to see, or that require a lot of fine-turning. ØComplex problems where no good solutions exist, state-of- the-art Machine Learning techniques may be able to succeed. ØFluctuating environments: a Machine Learning system can adapt to new data. ØObtaining insights from large amount of data or complex problems. Why Machine Learning? ØA machine learning algorithm then takes “training data” and produces a model to generate the correct output ØIf done correctly the program will generalize to cases not observed...more on this later ØInstead of writing programs by hand the focus shifts to collecting quality examples that highlight the correct output What is Machine Learning? A branch of artificial intelligence concerned with design and development of algorithms to build mathematical models capable of learning from sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. What is Machine Learning? Deep learning is a subset of machine learning that has networks capable of learning unsupervised from data that is unstructured or unlabeled. What is Data Science? ØData Science is multidisciplinary and all-encompassing field. Ø Includes: —Data Mining —Machine Learning —Big Data —Databases Source: , 2012 Important definitions Ø Task : Flower classification Ø Target (label, Class) Ø Features Iris setosa Iris versicolor Iris virginica Important definitions Ø Task : Flower classification Ø Target (label, Class) : Setosa, Versicolor, Virginica. Ø Features: Petal len, Petal wid, Sepal len, Sepal wid. Ø Model Ø Prediction Ø Data point (sample) Prediction: Versicolor Types of Machine Learning Systems It is useful to classify machine learning systems into broad categories based on the following criteria: ØSupervised, unsupervised, and reinforcement learning ØInstance-based versus model-based learning ØClassification versus regression Supervised/Unsupervised Learning ØMachine Learning systems can be classified according to the amount and type of supervision they get during training. Ø Supervised Øk-Nearest Neighbours, Linear Regression, Logistic Regression, Decision Trees, Neural Networks, and many more Ø Unsupervised ØK-Means, Principal Component Analysis ØReinforcement Learning ØReward for a series of action Ø Solve the maze to get the banana! Reinforcement learning What is the difference between Supervised and Reinforcement learning? https://www.youtube.com/watch?v=gn4nRCC9TwQ Instance-Based/Model-Based Learning ØInstance-Based: system learns the examples by heart, then generalizes to new cases by using a similarity/distance measure to compare them to the learned examples. Ømore details in week 3 ØModel-Based: build a model of these examples and then use that model to make predictions. Ømore details in weeks 9 to 11 Classification vs. Regression ØClassification: Discrete target ØSeparate the Dataset ØApples or oranges? ØIris classification ØHandwritten digit recognition Feature # 1 Feature # 2 Classification vs. Regression ØClassification: Discrete target ØSeparate the Dataset ØApples or oranges? ØIris classification ØHandwritten digit recognition ØRegression: Continues Target ØFit the dataset ØPrice of a house ØRevenue of a company ØAge of a tree Feature # 1 Feature # 1 Feature # 2 Challenges of Machine Learning ØInsufficient Data ØQuality Data ØRepresentative Data ØIrrelevant Features ØOverfitting the Training Data ØUnderfitting the Training Data ØTesting and Validation ØHyperparameter Tuning and Model Selection ØData Mismatch Course Theme End-to-End Machine Learning: 1. Understand the problem 2. Retrieve the data 3. Explore and visualize the data to gain insights 4. Prepare the data for the algorithm/model 5. Select and train the algorithm/model 6. Fine-tune your algorithm/model 7. Present your solution 8. Launch, monitor, and maintain your system End-to-End Machine Learning Understand Problem Data Visualization Model Selection Data Collection Data Preparation Model Training Understand the Problem ØOften, we need to make some sort of decisions (predictions) ØTwo common types of decisions that we make are: Ø Classification Ø Discrete number of possibilities Ø Regression Ø Continuous number of real-valued possibilities Supervised Unsupervised classification clustering regression dimensionality reduction Continuous Discrete Understand the Problem Input data is represented by features that can come in many forms: ØRaw pixels Ø Histograms ØTabular data Ø Spectrograms Ø... Data Exploration ØUnderstand your data through visualization ØAssess the difficulty of the problem ØYou have a data set D = {(x(i),y(i))} ØYou want to learn y = f(x) from D Ømore precisely, you want to minimize error in predictions ØWhat kind of model (algorithm) do you need? MNIST low-dimensional projection Model Selection Many classifiers to choose from ØSupport-Vector Machine (SVM) ØLogistic Regression ØRandom Forests ØNaive Bayes ØBayesian network ØK-Nearest Neighbour Ø(Deep) Neural networks Ø Etc. Model Selection ØOften the easiest algorithm to implement is k-Nearest Neighbours ØMatch to similar data using a distance metric Ømore details provided in week 3 Q: What happens as we increase #data? Q: What about as #data approaches infinity? ØUnlike us, computers have no trouble with memorization. ØThe real question is, how well does our algorithm make predictions on new data? ØWe need a way to measure how well our algorithm (model) generalizes to new, never before seen, data. Regression Example ØLet’s look at a more concrete example... ØGiven noisy sample data (blue), we want to find the polynomial that generated the data media exposure stock price Mean Squared Error ØNeed to first define our error term, in this case we can use the mean squared error (MSE): ØError is measured by finding the squared error in the prediction of y(x) from x. Ø The error for the red polynomial is the sum of the squared vertical errors Fitting the Data Q: Which polynomial fits the data best? Ø based on error term? Ø based on test data? Overfitting vs Underfitting High training error Acceptable training error Perfect training error (zero) and High test error (Underfit) and test error and high test error (Overfit) Generalization ØGiving the model a greater capacity (more complexity) to fit the data... does not necessarily help ØHow do we evaluate the model performance? Verify model on Overfitting ØIn brief: fitting characteristics of training data that do not generalize to future test data ØCentral problem in machine learning ØParticularly problematic if #data << #parameters Ø... don’t have enough data to “identify” parameters Generalization ØMachine learning is a game of balance, with our objective being to generalize to all possible future data Under-fitting Model Capacity (Complexity) New samples Over-fitting Training samples Error (% Incorrect) Bias-Variance Trade-off ØModels with too few parameters are inaccurate because of a large bias (not enough flexibility). ØModels with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Inductive Bias Inductive (Oxford Dict.): characterized by the inference of general laws from particular instances Ø Let’s avoid making assumptions about the model (polynomial order) ØAssume for simplicity that D = {(x(i),y(i))} is noise free Øx(i)’s in D only cover small subset of input space x ØQ: What’s the best we can do? ØIf we’ve seen x=x(i) report y=y(i) ØIf we have not seen x= x(i), can’t say anything (no assumptions) ØThis is called rote learning... boring, eh? Ø Key idea: you can't generalize to unseen data w/o assumptions! ØThus, key to ML is generalization Ø To generalize, ML algorithm must have some inductive bias Ø Bias usually in the form of a restricted model (hypothesis) space Ø Important to understand restrictions (and whether appropriate) 74 Inductive Bias Ø Example: Nearest neighbors Assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class. Training and Testing Data ØTrack generalization error by splitting data into training and testing Ø80% training and 20% testing ØMore data = better model ØWould like to use all our data for training, however we need some way to validate our results The problem with tracking test accuracy ØIf we track test loss/accuracy in our training curve, then: ØWe may make decisions about model architecture using the test accuracy! ØWhat K should be? ØThe final test accuracy will not be a realistic estimate of how our model will perform on a new data set! Validation Set ØWe still want to track the loss/accuracy on a data set not used for training ØIdea: set aside a separate data set, called the validation set ØTrack validation accuracy in the training curve ØMake decisions about model architecture using the validation set Validation Set ØWe still want to track the loss/accuracy on a data set not used for training ØIdea: set aside a separate data set, called the validation set ØTrack validation accuracy in the training curve ØMake decisions about model architecture using the validation set K is a hyperparameter. We tune hyperparameters using the validation set Validation and Holdout Data ØTraining, Validation and Testing Data ØLess data for your training m 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com