程序代写代做代考 kernel algorithm game flex decision tree Machine learning lecture slides

Machine learning lecture slides
COMS 4771 Fall 2020
0/26

Overview

Questions
I Please use Piazza Live Q&A
1/26

Outline
I A “bird’s eye view” of machine learning I About COMS 4771
2/26

Figure 1: Predict the bird species depicted in a given image.
3/26

Geared toward females
Sense and Sensibility
Ocean’s 11
Geared toward males
Dumb and Dumber
COVER FEATURE
The Color Purple
The Princess Diaries
Dave
The Lion King
Escapist
Independence Day
Serious
Amadeus
Braveheart
vector qi 􏰖 R ated with a vec i, the elements which the item positive or ne the elements o interest the use on the corresp tive or negative qiT pu, captures u and item i— the item’s chara user u’s rating o rui, leading to t
rˆ = q T p . ui i u
The major chal
ping of each ite
q,p 􏰖 Rf.Aft iu
completes this mate the rating by using Equat
Figure 2. A simplified illustration of the latent factor approach, which Figcuhraera2ct:erPizersebdoicthtuhseorws anad mgiovveiens ussinegrtwoauxleds—ramtaeleavergsiuvsefenmmaleovie.
and serious versus escapist.
recommendation. These methods have become popular in recent years by combining good scalability with predictive accuracy. In addition, they offer much flexibility for model-
Lethal Weapon
Gus
Such a model is closely rela
position (SVD), a well-establish 4/26
latent semantic factors in info

Figure 3: Predict the French translation of a given English sentence.
5/26

Figure 4: Predict the “win probability” of a given move in a given game state.
6/26

How to “solve” problems without ML?
I Image classification:
I Recruit a “bird expert” to teach you about dierent birds
features (e.g., beak shape, feather color, typical environment) I Recognize these features in a given image, and then come up
with a best guess of the bird species I Recommender system:
I Ask user to self-report specific movie genres of interest (e.g., horror, sci-fi)
I Ask movie suppliers to categorize movies into the same genres I Predict a high rating for any movie in a user’s genre-of-interest;
low rating for all other movies
I Machine translation: . . . I Chess: …
7/26

Work in ML
I Applied ML
I Collect/prepare data, build/train models, analyze
performance/errors/etc I ML developer
I Implement ML algorithms and infrastructure I ML research
I Design/analyze models and algorithms Note: These roles are not mutually exclusive!
8/26

Mathematical and computational prerequisites
I Math
I Linear algebra, probability, multivariable calculus
I Understand and reason about the concepts (not just
calculations)
I Software/programming
I Much ML work is implemented in python with libraries such as numpy and pytorch
9/26

Basic s•etting: supervised learning
Training data: dataset comprised of labeled examples I Labeled example: a pair of the form (input, label)
ra
Aly .
I
I Input: what you see before you make a prediction (a.k.a. context, side-information, features, etc.)
I Label: output value (a.k.a. output, response, target, etc.)
I Goal: learn predictor (i.e., prediction function) to predict label
:yEsi÷i
from input for new examples

μ¥edk7
10/26

input: (ageof user genre ofmovie , release date of movie, . – . . ): ,
Figure 5: Decision tree
11/26

input:(79,Kz,— s74066)
Figure 6: Linear classifier (“Perceptron”)
12/26

input hidden units output
Figure 7: Neural network
13/26

Types of prediction problems
I Binary classification
I Given an email, is it spam or not?
I (What’s the probability that it is spam?)
I Multi-class classification
I Given an image, what animal is depicted? I (Or which animals are depicted?)
I Regression
I Given clincal measurements, what is level of tumor antigens? I (In absolute level? Log-scale?)
I Structured output prediction
I Given a sentence, what is its grammatical parse tree? I (Or dependency tree?)
I …
14/26

Template of supervised learning pipeline
I Get data
I Determine representation of and predictive model for data
I Train the predictor (a.k.a. model fitting, parameter estimation) I Evaluate predictor (test the “goodness of fit”)
I Deploy predictor in application
15/26

Questions
E¥. ⇒ I What features (i.e., predictor variables) are available?
I Will these features be available at time of prediction?
I Is there enough information (e.g., training data, features) to
learn the relationship between the input and output?
I What are the modeling assumptions?
I Is high-accuracy prediction a useful goal for the application?
16/26
I What is the core prediction problem?

Where do assumptions / domain expertise come in?
I Form of the prediction function I Choice of features
I Choice of training data
I Choice of learning algorithm
I Choice of objective function and contraints
17/26

Challenges
I Might not have the “right” data
I Might not have the “right” model I Might under-fit the data
I Might over-fit the data
I Data might be corrupted, noisy, . . .
18/26

Key statistical/algorithmic ideas in ML
I Plug-in principle I Inductive bias
— ↳ inductive
preferences. reasoning
I Linearity
I Mathematical optimization

19/26

About COMS 4771
I Basic principles and methods of supervised machine learning 1. Appetizer: nearest neighbor rules (a “non-parametric” method)
2. Statistical model for prediction 3. Regression
I Why? Clean, simple, and illustrates important concepts (linearity, inductive bias, regularization, kernels)
4. Classification
5. Optimization methods for machine learning
I Convex optimization, neural networks 6. Maybe one other topic if time permits . . .
I This is not a course about how to use sklearn, tensorflow, etc. I Also not about latest nonsense on arXiv
I Good stu beyond COMS 4771:
I COMS 4252, 4773: Mathematical theory of learning I COMS 4774: Unsupervised learning
I COMS 4775: Causal inference
I …
20/26

About me
I Professor Daniel Hsu
I Okay to call me “Daniel”! I “Professor Hsu” also okay
I “Professor Daniel” is a little weird
I At Columbia since 2013
I Previously at Microsoft Research, Rutgers, UPenn, UC San
Diego, UC Berkeley, . . .
I Research interests: algorithms, statistics, & combining the two I Good at: LATEX-hacking
I Bad at: making slides
21/26

About you
I I assume you have fluency in I multivariable calculus,
I linear algebra, and
I elementary probability (no measure theory needed)
I I also assume you can read and write programs in Python
I (and read online documentation to learn, e.g., how to do I/O
with CSV files)
I See Courseworks for a “Python basics” Jupyter notebook to
brush up on Python, Numpy, etc.
I Let me know why you are interested in ML! I Part of HW 1.
22/26

Administrative stu
I Website: https://www.cs.columbia.edu/~djhsu/ML I Schedule for oce hours/lectures/homework/quizzes/exam I Syllabus
I Course format:
I Lecture/recitation: online over Zoom
I “On Campus” people: check email about in-person lectures
I Course assistants (CAs):
I Andy, Andrea, Wonjun, Serena
I Links for online oce hours will be posted on Courseworks
I Technology:
I Piazza: communicate with course sta (live Q&A and oine)
I Courseworks: retrieve assignments, quizzes, data files, etc. I Gradescope: submit homework write-ups, code
I Slack: discussion with fellow classmates
I Disability services:
I Please make arrangements with disability services ASAP
23/26

Academic rules of conduct
I See syllabus
I Cheating: don’t do it
I If unsure about something, ask! I Consequence is automatic fail
I Cheating out of desperation is also cheating I Instead: get help early
I We are here to help
I Okay to work on homework in groups of Æ 3 I No collaboration across groups
I No diusion of responsibility
I No collaboration at all on quizzes or exams
24/26

Reading assignments
I There are some required reading assignments (mostly from handouts posted on website)
I Unfortunately, most textbooks on ML are not appropriate for this course
I Closest is “A Course in Machine Learning” by Daumé
I I have selected some optional reading assignments from a few
books that may be used to supplement the lectures I All books available online
25/26

Questions?
HW?
Q 1 i
will groups be assigned for
No- feel free to make your
own
groups .
26/26