CS计算机代考程序代写 python data structure information retrieval database Bayesian finance data mining information theory algorithm Lecture 1. Introduction. Probability Theory COMP90051 Statistical Machine Learning

Lecture 1. Introduction. Probability Theory COMP90051 Statistical Machine Learning
Sem2 2019 Lecturer: Ben Rubinstein
Copyright: University of Melbourne

COMP90051 Statistical Machine Learning
This lecture
• Machinelearning:whyandwhat?
• About COMP90051
• Review:MLbasics,Probabilitytheory
2

COMP90051 Statistical Machine Learning
Why Learn Learning?
3

COMP90051 Statistical Machine Learning
Motivation
• “Wearedrowningininformation, but we are starved for knowledge”
– John Naisbitt, Megatrends
• Data=rawinformation
• Knowledge=patternsormodelsbehindthedata
4

COMP90051 Statistical Machine Learning
Solution: Machine learning
• Hypothesis:pre-existingdatarepositoriescontaina lot of potentially valuable knowledge
• Missionoflearning:findit • Definitionoflearning:
(semi-)automatic extraction of valid, novel, useful and comprehensible knowledge – in the form of rules, regularities, patterns, constraints or models – from arbitrary sets of data
5

COMP90051 Statistical Machine Learning
Applications of ML are deep and prevalent
• Online ad selection and placement
• Risk management in finance, insurance, security
• High-frequency trading
• Medical diagnosis
• Mining and natural resources
• Malware analysis
• Drug discovery
• Search engines …
6

COMP90051 Statistical Machine Learning
Draws on many disciplines
• Artificial Intelligence
• Statistics
• Continuous optimisation
• Databases
• Information Retrieval
• Communications/information theory
• Signal Processing
• Computer Science Theory
• Philosophy
• Psychology and neurobiology …
7

COMP90051 Statistical Machine Learning
Job$
Many companies across all industries hire ML experts:
Data Scientist Analytics Expert Business Analyst Statistician Software Engineer Researcher

8

COMP90051 Statistical Machine Learning
About this Subject
(refer also to LMS)
9

COMP90051 Statistical Machine Learning
Lecturer & Coordinator
Tutors:
Contact: Office Hours
Vital statistics
Ben Rubinstein (DMD7, brubinstein@unimelb.edu.au) Associate Prof, Computing & Information Systems
Statistical Machine Learning, ML + Privacy/Security/Databases
Justin Tan (Head Tutor; justan@student.unimelb.edu.au) Kazi Abir Adnan, Xudong Han, Peishan Li, Yitong Li, Navnita Nandakumar, Hasti Samadi, Jun Wang Contact info: LMSStaff information
Weekly you should attend: 2x Lectures & 1x Workshop Fridays 2:30-3:30pm 7.02 DMD Building
First port of call: LMS Discussion Board
Our aim half business day latency!
10

COMP90051 Statistical Machine Learning
About me (Ben)
• PhD2010–Berkeley,USA
• 4yearsinindustryresearch
∗ Silicon Valley: Google Research, Yahoo! Research, Intel Labs, Microsoft Research
∗ Australia: IBM Research
∗ Patented & Published, Developed & Tested, Recruited
∗ Impacted: Xbox, Bing (MS), Firefox (Mozilla), Kaggle, ABS …
• Interests:Machinelearningtheory;
adversarial ML; differential privacy; stat record linkage
11

COMP90051 Statistical Machine Learning
Subject content
• Thesubjectwillcovertopicsfrom
Foundations of statistical learning, linear models, non-linear bases, kernel approaches, neural networks, Bayesian learning, probabilistic graphical models (Bayes Nets, Markov Random Fields), cluster analysis, dimensionality reduction, regularisation and model selection
• Theoryinlectures;hands-onexperiencewithrange of toolkits in workshop pracs and projects
• VsCOMP90049:muchdepth,muchrigor,sowow
12

COMP90051 Statistical Machine Learning
Advanced ML: Expected Background
• Why a challenge: Diverse math methods + CS + coding
• ML: COMP90049; refresher deck on LMSResources
• Alg & complexity: big-oh, termination; basic data structures & algorithms; solid coding ideally experience in Python
• Maths: Refreshers but really need solid understanding in advance “Matrix A is symmetric & positive definite, hence its eigenvalues…”
• Probability theory: probability calculus; discrete/continuous distributions; multivariate; exponential families; Bayes rule
• Linear algebra: vector inner products & norms; orthonormal bases; matrix operations, inverses, eigenvectors/values
• Calculus & optimisation: partial derivatives; gradient descent; convexity; Lagrange multipliers
13

COMP90051 Statistical Machine Learning
Subject objectives
• Developanappreciationfortheroleofstatistical machine learning, both in terms of foundations and applications
• Gainanunderstandingofarepresentativeselection of ML techniques
• Beabletodesign,implementandevaluateML systems
• BecomeadiscerningMLconsumer
14

COMP90051 Statistical Machine Learning
Textbooks
• Primarilyreferencesto
∗ Bishop (2007) Pattern Recognition and
Machine Learning
• Othergoodgeneralreferences:
∗ Murphy (2012) Machine Learning: A Probabilistic Perspective [read free ebook using ‘ebrary’ at http://bit.ly/29SHAQS]
∗ Hastie, Tibshirani, Friedman (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction [free at http://www-stat.stanford.edu/~tibs/ElemStatLearn]
15

COMP90051 Statistical Machine Learning
Textbooks • ReferencesforPGMcomponent
∗ Koller, Friedman (2009) Probabilistic Graphical Models: Principles and Techniques
16

COMP90051 Statistical Machine Learning
Assessment
• Assessment components
∗ Two projects – one released early (w4-7), one late (w9-11);
will have ~3 weeks to complete
• Each (25%)
• At least one will be group projects (possibly both)
∗ Final Exam (50%)
• 50%Hurdleappliestobothexamand ongoing assessment
17

COMP90051 Statistical Machine Learning
Machine Learning Basics
18

COMP90051 Statistical Machine Learning
Terminology
• Input to a machine learning system can consist of
∗ Instance: measurements about individual entities/objects
a loan application
∗ Attribute (aka Feature, explanatory var.): component of the instances
the applicant’s salary, number of dependents, etc.
∗ Label (aka Response, dependent var.): an outcome that is categorical, numeric, etc.
forfeit vs. paid off
∗ Examples: instance coupled with label <(100k, 3), “forfeit”>
∗ Models: discovered relationship between attributes and/or label
19

COMP90051 Statistical Machine Learning
Supervised vs unsupervised learning
Data
Model used for
Supervised learning
Labelled
Predict labels on new instances
Unsupervised learning
Unlabelled
Cluster related instances; Project to fewer dimensions; Understand attribute relationships
20

COMP90051 Statistical Machine Learning
Architecture of a supervised learner
Examples
Instances
Train data
Learner
Test data
Labels
Model
Labels
Evaluation
21

COMP90051 Statistical Machine Learning
Evaluation (supervised learners)
• Howyoumeasurequalitydependsonyourproblem!
• Typicalprocess
∗ Pick an evaluation metric comparing label vs prediction
∗ Procure an independent, labelled test set
∗ “Average” the evaluation metric over the test set
• Exampleevaluationmetrics
∗ Accuracy, Contingency table, Precision-Recall, ROC curves
• Whendatapoor,cross-validate
22

COMP90051 Statistical Machine Learning
Probability Theory
(This should be a) brief refresher
23

COMP90051 Statistical Machine Learning
Data is noisy (almost always)
Training data*
• Example:
∗ given mark for Knowledge
Technologies (KT)
∗ predict mark for Stat Machine Learning (SML)
KT mark * synthetic data 🙂
24
SML mark

COMP90051 Statistical Machine Learning
Types of models
𝑦𝑦� = 𝑓𝑓 𝑥𝑥
KT mark was 95, SML mark is predicted to be 95
𝑃𝑃 𝑦𝑦 𝑥𝑥
KT mark was 95, SML mark is likely to be in (92, 97)
𝑃𝑃(𝑥𝑥, 𝑦𝑦)
𝑥𝑥
probability of having (𝐾𝐾𝐾𝐾 = 𝑥𝑥, 𝑆𝑆𝑀𝑀𝑀𝑀 = 𝑦𝑦)
25

COMP90051 Statistical Machine Learning
Basics of probability theory
• Aprobabilityspace:
∗ Set Ω of possible
outcomes
∗ Set F of events (subsets of outcomes)
∗ Probability measure P: FR
• Example:adieroll ∗ {1, 2, 3, 4, 5, 6}
∗ { φ, {1}, …, {6}, {1,2}, …, {5,6}, …, {1,2,3,4,5,6} }
∗ P(φ)=0, P({1})=1/6, P({1,2})=1/3, …
26

COMP90051 Statistical Machine Learning
Axioms of probability 1. 𝑃𝑃(𝑓𝑓)≥0 foreveryeventfinF
2. 𝑃𝑃 ⋃𝑓𝑓 𝑓𝑓 = ∑𝑓𝑓 𝑃𝑃(𝑓𝑓) for all collections* of pairwise disjoint events
3. 𝑃𝑃 Ω = 1
* We won’t delve further into advanced probability theory, which starts with measure theory. But to be precise, additivity is over collections of countably-many events.
27

COMP90051 Statistical Machine Learning
$5 bet on even die roll ∗ X maps 1,3,5 to -5
X maps 2,4,6 to 5
∗ P(X=5) = P(X=-5) = 1⁄2
Random variables (r.v.’s)
• ArandomvariableXisa • Example:Xwinningson
numeric function of outcome 𝑋𝑋(𝜔𝜔) ∈ 𝑹𝑹
• 𝑃𝑃 𝑋𝑋 ∈ 𝐴𝐴 denotes the probability of the outcome being such that X falls in the range A
28

COMP90051 Statistical Machine Learning
Discrete vs. continuous distributions
• Discretedistributions
∗ Govern r.v. taking discrete
values
• Continuousdistributions ∗ Govern real-valued r.v.
∗ Described by probability mass function p(x) which is P(X=x) 𝑎𝑎=−∞
∗ Cannot talk about PMF but rather probability density function p(x) −∞
∗𝑃𝑃𝑋𝑋≤𝑥𝑥 =∑𝑥𝑥 𝑝𝑝(𝑎𝑎)
∗𝑃𝑃𝑋𝑋≤𝑥𝑥 =∫𝑥𝑥 𝑝𝑝𝑎𝑎𝑑𝑑𝑎𝑎
∗ Examples: Bernoulli, Binomial, Multinomial, Poisson
∗ Examples: Uniform, Normal, Laplace, Gamma, Beta, Dirichlet
29

COMP90051 Statistical Machine Learning
Expectation
• ExpectationE[X]isther.v.X’s“average”value ∗Discrete:𝐸𝐸𝑋𝑋 =∑ 𝑥𝑥𝑃𝑃(𝑋𝑋=𝑥𝑥)
𝑥𝑥
∗ Continuous:𝐸𝐸 𝑋𝑋 =∫𝑥𝑥 𝑥𝑥𝑝𝑝 𝑥𝑥 𝑑𝑑𝑥𝑥
• Properties
∗ Linear:𝐸𝐸 𝑎𝑎𝑋𝑋+𝑏𝑏 =𝑎𝑎𝐸𝐸 𝑋𝑋 +𝑏𝑏
𝐸𝐸𝑋𝑋+𝑌𝑌=𝐸𝐸𝑋𝑋+𝐸𝐸𝑌𝑌
∗ Monotone: 𝑋𝑋 ≥ 𝑌𝑌 ⇒ 𝐸𝐸 𝑋𝑋 ≥ 𝐸𝐸 𝑌𝑌
• Variance:𝑉𝑉𝑎𝑎𝑉𝑉𝑋𝑋 =𝐸𝐸[𝑋𝑋−𝐸𝐸𝑋𝑋 2]
30
-4 -2 0 2 4 x
p(x)
0.0 0.1 0.2 0.3 0.4

COMP90051 Statistical Machine Learning
Independence and conditioning
• X, Y are independent if • ∗𝑃𝑃𝑋𝑋∈𝐴𝐴,𝑌𝑌∈𝐵𝐵 =
∗𝑃𝑃𝐴𝐴𝐵𝐵=
∗ Similarly for densities
𝑃𝑃 𝑋𝑋∈𝐴𝐴 𝑃𝑃(𝑌𝑌∈𝐵𝐵)
𝑃𝑃(𝐴𝐴∩𝐵𝐵)
Conditional probability
∗ Similarly for densities:
𝑝𝑝 𝑥𝑥,𝑦𝑦=𝑝𝑝(𝑥𝑥)𝑝𝑝(𝑦𝑦)
𝑝𝑝𝑦𝑦𝑥𝑥= 𝑃𝑃(𝐵𝐵) 𝑝𝑝(𝑥𝑥,𝑦𝑦)
𝑋𝑋,𝑌𝑌 𝑋𝑋𝑌𝑌
𝑝𝑝(𝑥𝑥)
∗ Intuitively: knowing value of Y reveals nothing about X
∗ Algebraically: the joint on X,Y factorises!
∗ Intuitively: probability event A will occur given we know event B has occurred
∗ X,Y independent equiv to
𝑃𝑃 𝑌𝑌=𝑦𝑦𝑋𝑋=𝑥𝑥 =𝑃𝑃(𝑌𝑌=𝑦𝑦)
31

COMP90051 Statistical Machine Learning
Inverting conditioning: Bayes’ Theorem
Bayes
• In terms of events A, B
∗𝑃𝑃𝐴𝐴∩𝐵𝐵 =𝑃𝑃𝐴𝐴𝐵𝐵𝑃𝑃𝐵𝐵 =𝑃𝑃𝐵𝐵𝐴𝐴𝑃𝑃𝐴𝐴
∗𝑃𝑃𝐴𝐴𝐵𝐵 =𝑃𝑃(𝐵𝐵|𝐴𝐴)𝑃𝑃(𝐴𝐴) 𝑃𝑃(𝐵𝐵)
• Simplerulethatletsusswapconditioningorder
• Bayesianstatisticalinferencemakesheavyuse
∗ Marginals: probabilities of individual variables
∗ Marginalisation: summing away all but r.v.’s of interest
𝑃𝑃𝐴𝐴 =∑ 𝑃𝑃(𝐴𝐴,𝐵𝐵=𝑏𝑏) 𝑏𝑏
32

COMP90051 Statistical Machine Learning
Summary
• Why study machine learning? • COMP90051
• Machine learning basics
• Review of probability theory
Homework week #1: COMP90049 & linear algebra decks Jupyter notebooks setup and launch (at home or labs)
Next time: Statistical schools of thought – how many ML algorithms come to be
33