RMIT Classification: Trusted
Foundations of ML
COSC 2673-2793 | Semester 1 2021 (Computational) Machine Learning
RMIT Classification: Trusted
Definitions
What is Machine Learning?
COSC2673 | COSC2793 Week1: Foundations of ML 2
RMIT Classification: Trusted
What is Machine Learning?
“Machine learning is the field of study that gives the ability to learn without being explicitly programmed.” – Arthur Samuel (1959)
Input
Distance from city Floor Area Number of Rooms Avg Income
Rank of Schools …
Output
Explicit Program
If distance > 500m^2 and floor area < 10km then: price = $100000
Elseif ... ....
Explicit program: The human expert decide the criterion and implement it in code.
Price ($)
COSC2673 | COSC2793 Week1: Foundations of ML 3
RMIT Classification: Trusted
House Price Prediction Example
Distance from city Floor Area Number of Rooms Avg Income
Rank of Schools ...
Price ($)
Data
Machine Learning Algorithm or Program
Tuneable parameters
COSC2673 | COSC2793 Week1: Foundations of ML 4
RMIT Classification: Trusted
What is Machine Learning?
Machine learning is programming computers to optimise a performance criterion/perform a particular task by generalising from examples of past experience(s) to predict what is occur in future experience(s).
More technically
"A computer program is said to learn: • SomeclassoftasksT
• FromexperienceE,and • PerformancemeasureP
If its performance at tasks in T, as measured by P, improves with experience E”
- Tom Mitchell (1998)
COSC2673 | COSC2793 Week1: Foundations of ML 5
RMIT Classification: Trusted
Task: Unknown Target Function
The Task can be expressed an unknown target function: 𝐲 = 𝑓(𝐱)
Ø Attributes of the task (input): x
Ø Unknown function: 𝑓(𝐱)
Ø Output of the function (target value): 𝐲
ML finds a Hypothesis, h, (from hypothesis space 𝐻) which is a function (or model) which approximates the unknown target function ∗
𝑦' = h ( 𝐱 ) ≈ 𝑓 ( 𝐱 )
Ø The hypothesis is learnt from the Experience
Ø A good hypothesis has a high evaluation from the Performance measure
Ø The hypothesis generalises to predict the output of instances from outside of the Experience
COSC2673 | COSC2793 Week1: Foundations of ML 6
RMIT Classification: Trusted
Experience
The Experience is typically a data set, 𝒟, of values:
𝒟= x("),𝑓 x(") Output of the Unknown function (target value): 𝑓(x("))
A data instance (or data point) 𝑖 is a tuple**: x("), 𝑓 x(")
We do not know 𝑓(𝐱) but, can obtain samples (input output pairs) from it – observations or
input/output of an black box phenomenon.
** Assume supervised learning
Attributes of the task: x(")
& "$%
COSC2673 | COSC2793 Week1: Foundations of ML 7
RMIT Classification: Trusted
Performance
What does success look like? To evaluate the abilities of a machine learning algorithm, we must design quantitative measure of its performance.
We would like to measure: h∗(𝐱) ≈ 𝑓(𝐱)
The Performance is typically numerical measure that determines how well the
hypothesis matches the experience.
Note, the performance is measured against the experience. NOT the unknown target function!
Usually we are interested in how well the machine learning algorithm performs when deployed in the real world - unseen data. We therefore evaluate these performance measures using a test set of data that is separate from the data used for training the machine learning system.
COSC2673 | COSC2793 Week1: Foundations of ML 8
RMIT Classification: Trusted
Revision
qAssume you have to develop a machine learning model to do spam classification. What would be the task, experience and performance measure be?
COSC2673 | COSC2793 Week1: Foundations of ML 9
RMIT Classification: Trusted
Simple Example
House price prediction
COSC2673 | COSC2793 Week1: Foundations of ML 10
RMIT Classification: Trusted
Example: House Price Prediction
Experience
Distance from city (d) Floor Area (a)
Over $1m (or not)
𝑓(𝐱)
𝑖
d
a
𝑓(𝐱)
1
25
150
N
2
10
100
Y
...
...
...
...
n
32
450
Y
ML Model
What form should h(𝐱) take?
a
h(𝐱)
d
COSC2673 | COSC2793 Week1: Foundations of ML 11
Historical data from house
RMIT Classification: Trusted
Experience
Distance from city (d) Floor Area (a)
Over $1m (or not)
𝑓(𝐱)
h(𝐱)
a
d
ML Model
We can select a hypothesis space
that is “reasonably” capable of a representing the unknown target
function.
e.g., linear decision boundaries.
Usually, the hypothesis space is determined by the ML technique.
a
d
d
How can we select the best hypothesis h∗(𝐱) from the hypothesis space?
COSC2673 | COSC2793 Week1: Foundations of ML 12
Historical data from house
RMIT Classification: Trusted
Distance from city (d) Floor Area (a)
Over $1m (or not)
𝑓(𝐱)
h(𝐱)
ML Model
We need a performance measure that quantify what “best” means.
Let's use the count of how many errors we make**.
Now we combine data and hypothesis space and, select the hypothesis that makes the least number of errors as our optimal hypothesis h∗(𝐱).
This is done automatically through the optimization procedure.
Once we have the h∗(𝐱) We can use it to predict the target value for future data points.
a
d
** May not be the best measure. We will learn better ones later in the course.
COSC2673 | COSC2793 Week1: Foundations of ML 13
Historical data from house
RMIT Classification: Trusted
Building a Machine Learning Algorithm
Nearly all ML algorithms can be described with a fairly simple recipe:
Ø Dataset (experience)
Ø Model (hypothesis space)
Ø Cost function (objective, loss) Ø Optimization procedure
The first step in solving a ML problem is to analyse the data and task to identify the above components.
Games against experts
Determine Type
of Training Experience
Games against self
Determine Target Function
Table of correct moves
...
Board ➝ move
Board
➝ value
Determine Representation of Learned Function
...
Polynomial
...
Linear function of six features
Artificial neural network
...
Determine Learning Algorithm
Gradient descent
Linear programming
Completed Design
Image: Tom Mitchell, ”Machine Learning”, 1997.
COSC2673 | COSC2793 Week1: Foundations of ML 14
RMIT Classification: Trusted
Revision
qWhat is the difference between hypothesis space and optimal hypothesis?
q What are the key ingredients of a general ML recipe?
qDevise the Task, Performance measure and experience for “spam email classification” program.
COSC2673 | COSC2793 Week1: Foundations of ML 15
RMIT Classification: Trusted
True Error & Generalization
Can the model be “effectively” used to predict for unseen data?
COSC2673 | COSC2793 Week1: Foundations of ML 16
RMIT Classification: Trusted
True Hypothesis & True Error
Recall, Machine learning uses past experience to predict future experience(s).
Ø We really want to know what the performance of a hypothesis is against the target
function, known as the true error: Ø However, this cannot be known
h∗(𝐱) ≈ 𝑓(𝐱)
Ø That is because, ML uses experience (data sets) which are a limited sample of the “true” problem, that is, the unknown target function.
All algorithms for Machine Learning make a significant assumption:
The experience is a reasonable representation (or reasonable sample) of the true but unknown target function
COSC2673 | COSC2793 Week1: Foundations of ML 17
RMIT Classification: Trusted
True Hypothesis & True Error
Instance space X
-f ch
-
-
Where c
and h disagree
f
+ +
What is the error of the hypothesis h on the five data points provided?
Ø Is the true-error zero?
Ø How about the performance measured using the data?
COSC2673 | COSC2793 Week1: Foundations of ML 18
Generalization
The central challenge in machine learning is that our algorithm must perform well on new, previously unseen inputs (not just those on which our model was trained).
The ability to perform well on previously unseen inputs is called
generalization.
Ø Generalization error is related to the true error of a hypothesis
(cannot be measured).
Select the hypothesis with train data
h∗(𝐱) ≈ 𝑓(𝐱)
Ø The generalization error of a machine learning model is typically estimated by measuring its performance on a test set collected separately from the training set.
How can an algorithm influence the performance on an unseen dataset, if it only see training data?
Assumption in ML: The test set is independent and identically distributed with respect to the training set.
Measure performance with test data
RMIT Classification: Trusted
COSC2673 | COSC2793 Week1: Foundations of ML 19
RMIT Classification: Trusted
Performance of a ML algorithm
The factors determining how well a machine learning algorithm will perform are its ability to
Ø Make the training set performance high.
Ø Make the gap between training and test performance small (generalization)
Test set Train set performance performance
Generalization gap
Performance
The above concepts are related to model capacity, under or over fitting. We will discuss these in detail in future lectures.
COSC2673 | COSC2793 Week1: Foundations of ML 20
RMIT Classification: Trusted
Revision
qIs the performance evaluated over training examples? why?
qWhat would be the true error and the training error for the below data
and the given hypothesis?
hypothesis
COSC2673 | COSC2793 Week1: Foundations of ML 21
RMIT Classification: Trusted
Model complexity
Which hypothesis should be chosen?
COSC2673 | COSC2793 Week1: Foundations of ML 22
RMIT Classification: Trusted
Hypothesis Space
The hypothesis space, 𝐻, is the set of all hypotheses over the state space of a given problem that a given algorithm is capable of learning
The great ML question is:
which hypothesis should be learnt?
In this example we want to separate (red x) from the rest. Out hypothesis space is all the possible rectangles.
COSC2673 | COSC2793 Week1: Foundations of ML 23
RMIT Classification: Trusted
Which hypothesis to learn?
Consider questions such as:
Ø Should the experience be matched?
Ø Should be performance be maximized?
Ø What happens if there is noise?
Ø Which hypothesis should be learnt if multiple hypotheses all have the same performance?
Ø Can a good hypothesis be found?
COSC2673 | COSC2793 Week1: Foundations of ML 24
RMIT Classification: Trusted
Ockham’s Razor
The principle of Ockham’s Razor is to prefer the simplest hypothesis that is ("reasonably") consistent with the experience
COSC2673 | COSC2793 Week1: Foundations of ML 25
RMIT Classification: Trusted
Example: Ockham’s Razor
Let's say student A has received low grade in ML assignment 1. The student and the teacher has produced two hypothesis that explains the results.
Ø Teachers Hypothesis: “The student has not spent enough time on the assignment and therefore has done a suboptimal job”.
Ø Students Hypothesis: “A foreign hacker has infiltrated the RMIT canvas site and removed some parts of (only) student A’s submission” .
Which hypothesis would you pick?
COSC2673 | COSC2793 Week1: Foundations of ML 26
RMIT Classification: Trusted
Ockham’s Razor
Hypothesis 1 Hypothesis 2
Hypothesis 3
Given that all three hypotheses has zero training error, Ockham’s razor says that we should choose the “simplest” hypothesis.
We will cover how to measure “simplest” later in the course.
COSC2673 | COSC2793 Week1: Foundations of ML 27
RMIT Classification: Trusted
What is the best model?
Is the additional data point correct, and outlier, or noise? What model should be learnt?
COSC2673 | COSC2793 Week1: Foundations of ML 28
RMIT Classification: Trusted
What is the best model?
Is the additional data point correct, and outlier, or noise? What model should be learnt?
COSC2673 | COSC2793 Week1: Foundations of ML 29
RMIT Classification: Trusted
Ultimate Judgement
The core challenge of ML is NOT: Ø Collecting Data
Ø Running algorithms
Ø Maximising Performance
The core challenge is in:
Ø Deciding what data to use
Ø Deciding if an algorithm is suitable
Ø Deciding the most suitable performance measure
Ø Deciding which hypothesis is “the best” to use for a task
Ø Making an ultimate judgement of how to approximate the unknown target function
COSC2673 | COSC2793 Week1: Foundations of ML 30
RMIT Classification: Trusted
Ultimate Judgement
The core challenge is in analysis and evaluation.
This is the focus of the course
Running algorithms is necessary and important, but not the top priority
This balance is best though of as:
We are not looking for the “best hypothesis”
We are looking for the “best hypothesis for the task that you can justify”
COSC2673 | COSC2793 Week1: Foundations of ML 31
RMIT Classification: Trusted
Can Machine Learning Pick Your Next Winning Lottery Number?
https://www.youtube.com/watch?v=isTNnwk5SqE
COSC2673 | COSC2793 Week1: Foundations of ML 32
RMIT Classification: Trusted
Main Machine Learning Paradigms
What are the common types of problems in ML?
COSC2673 | COSC2793 Week1: Foundations of ML 33
RMIT Classification: Trusted
Types of Machine Learning Problems
Ø Supervised learning • Classification
• Regression
Ø Unsupervised learning Ø Reinforcement Learning
COSC2673 | COSC2793 Week1: Foundations of ML 34
RMIT Classification: Trusted
Supervised Learning
In supervised learning, the output is known: 𝑦 = 𝑓(𝐱)
Experience: Examples of input-output pairs
Task: Learns a model that maps input to desired output
Predict the output for new “unseen” inputs.
Performance: Error measure of how closely the hypothesis predicts the target output
Most typical of learning tasks
Two main types of supervised learning: Ø Classification
Ø Regression
COSC2673 | COSC2793 Week1: Foundations of ML 35
RMIT Classification: Trusted
Supervised Learning - Examples
cat
dog dog
Classifier (ML Algorithm)
COSC2673 | COSC2793 Week1: Foundations of ML 36
RMIT Classification: Trusted
Examples: Computer Vision
Toshiba Advanced Driver Assistance Systems
Axial slices of two 3D-CT images with (left) and without (right) emphysema.
Tennakoon, R., et. al. “Classification of Volumetric Images Using Multi- Instance Learning and Extreme Value Theorem”. IEEE Transactions on Medical Imaging, 2019.
COSC2673 | COSC2793 Week1: Foundations of ML 37
RMIT Classification: Trusted
Examples: Fraud Detection
Source: http://businessdaily.co.zw/index-id-business-zk-34108.html
COSC2673 | COSC2793 Week1: Foundations of ML 38
RMIT Classification: Trusted
Examples: Speech
Source: http://www.scmp.com/magazines/post- magazine/article/1925784/why-baidus-breakthrough- speech-recognition-may-be-game
COSC2673 | COSC2793 Week1: Foundations of ML 39
RMIT Classification: Trusted
Unsupervised Learning
In unsupervised learning, the output is unknown: ? = 𝑓(𝐱)
Experience: Data set with values for some or all attributes Task: “Invent” a suitable output.
Identify trends and patterns between the data points Performance: How well the “invented” output matches the data set
COSC2673 | COSC2793 Week1: Foundations of ML 40
RMIT Classification: Trusted
Examples: Simple Clustering
Patient blood glucose level
Patient weight
COSC2673 | COSC2793 Week1: Foundations of ML 41
RMIT Classification: Trusted
Examples: Recommendation
Source: https://github.com/watfood/worldofdance_app
COSC2673 | COSC2793 Week1: Foundations of ML 42
RMIT Classification: Trusted
Examples: Computer Vision
Image: courtesy of Nvidia
Smart paint brush with Generative adversarial network (GAN)
COSC2673 | COSC2793 Week1: Foundations of ML 43
RMIT Classification: Trusted
Reinforcement Learning
In reinforcement learning, the target function is to learn an optimal policy, which is the best “action” for a dynamic agent to perform at any point in
time
𝑎 = 𝜋∗(𝐬)
Experience: A transition function, the result of performing any action in a state
Task: Learn the optimal actions required for the agent to achieve a goal Performance: A reward (or reinforcement) for performing certain action(s)
COSC2673 | COSC2793 Week1: Foundations of ML 44
RMIT Classification: Trusted
Reinforcement Learning
In reinforcement learning, the target function is to learn an optimal policy, which is the best “action” for a dynamic agent to perform at any point in
time
𝑎 = 𝜋∗(𝐬)
Conceptually, reinforcement learning shares similarities with supervised and unsupervised learning:
The output (action, 𝑎) is unknown, however,
The experience gives an “output” of performing actions in states: 𝑠, 𝑎 → 𝑠,
The performance measures the “worth/reward” of each experience instance: 𝑅(𝑠, 𝑎)
The performance acts as a proxy for the “actual” output, since in simple terms, it is the best “reward”, that is accumulated over time as the agent conducts actions
COSC2673 | COSC2793 Week1: Foundations of ML 45
RMIT Classification: Trusted
Reinforcement Learning
COSC2673 | COSC2793 Week1: Foundations of ML 46
RMIT Classification: Trusted
Examples: Robotics
COSC2673 | COSC2793 Week1: Foundations of ML 47
RMIT Classification: Trusted
Examples: Game AI
COSC2673 | COSC2793 Week1: Foundations of ML 48
RMIT Classification: Trusted
Types of Machine Learning Problems
Others Semi-supervised
Active learning Transfer learning ...
COSC2673 | COSC2793 Week1: Foundations of ML 49
RMIT Classification: Trusted
Summary & Todos
Introduced Machine Learning Definition
Applications
Types of ML (supervised, unsupervised, reinforcement learning)
For next week
Get familiar with Python
Revise maths
No matter what, do not fall behind – this course that builds week by week
COSC2673 | COSC2793 Week1: Foundations of ML 50
RMIT Classification: Trusted
Example for Q&A session
COSC2673 | COSC2793 Week1: Foundations of ML 51
RMIT Classification: Trusted
Income and happiness?
• Is there a relationship between Income and happiness?
• Can we predict happiness of a person if we know the income?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
COSC2673 | COSC2793 Week1: Foundations of ML 52
RMIT Classification: Trusted
Income and happiness?
• Is there a relationship between Income and happiness?
• Can we predict happiness of a person if we know the income?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
𝑦=𝑓𝑥 𝑦% = h 𝑥
Task: Unknown target function exists hypothesis: model that approximates f(x)
COSC2673 | COSC2793 Week1: Foundations of ML 53
RMIT Classification: Trusted
Income and happiness?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
𝑥
𝑦
10k
2
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 54
Happiness
RMIT Classification: Trusted
Income and happiness?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
𝑥
𝑦
10k
2
40k
5
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 55
Happiness
RMIT Classification: Trusted
Income and happiness?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
𝑥
𝑦
10k
2
40k
5
25k
3
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 56
Happiness
RMIT Classification: Trusted
Income and happiness?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
𝑥
𝑦
10k
2
40k
5
25k
3
What is an assumption we are making here?
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 57
Happiness
RMIT Classification: Trusted
Income and happiness?
• Let's try to build a model that takes income (x) as input and produces the output happiness (y)
𝑥
𝑦
10k
2
40k
5
25k
3
Can we do more complex model to get error to 0?
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 58
Happiness
RMIT Classification: Trusted
Income and happiness?
• Will the model we developed generalize?
𝑥
𝑦
10k
2
40k
5
25k
3
What is the training error and true error?
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 59
Happiness
RMIT Classification: Trusted
Income and happiness?
• Will the model we developed generalize?
𝑥
𝑦
10k
2
40k
5
25k
3
80k
7
100k
7
30k
4
20k
2
Income ($10k)
COSC2673 | COSC2793 Week1: Foundations of ML 60
Happiness