ANLP_1: Introduction
ISML_1: Overview of Machine Learning and Essential
Mathematic Skills for Machine Learning
Lingqiao Liu
University of Adelaide
What’s your impression about Machine
Learning
University of Adelaide 2
Outlines
University of Adelaide 3
• Course Introduction
• What is machine learning and its application
• Machine Learning taxonomy and framework
• Mathematic basics in Machine Learning
– Basic algorithmic calculations
– Linear algebra: vector, matrix
– Matrix calculus
– Optimization
– Probability theory
Outlines
University of Adelaide 4
• Course Introduction
• What is machine learning and its application
• Machine Learning taxonomy and framework
• Mathematic basics in Machine Learning
– Basic algorithmic calculations
– Linear algebra: vector, matrix
– Matrix calculus
– Optimization
– Probability theory
Course content
• Focus on basic concepts and algorithms in machine
learning, traditional and statistic machine learning
technology
– There are courses focusing on advanced topics, e.g., deep
learning or application-oriented content, e.g., applied machine
learning
– This course is expected to lay a good foundation for your future
study
– It can be math intensive
University of Adelaide 5
Course content
University of Adelaide 6
Course Introduction
• Course coordinator: Dr. Lingqiao Liu
• Colecturer: Dr. Dong Gong
– Email: lingqiao. .au dong. .au
– Office: 1.23 Australian Institute for Machine Learning
• Tutors:
– Jinan Zou, Qiaoyang Luo, Bowen Zhang
• Components and assessments
– 12 Lectures: 11 main lectures + 1 guest lecture
– 4 workshops
– 4 assignments (50%)
• 1 simple assignment on solving several math problems (related to ML)
(5%)
• 3 assignments involves implementing machine learning algorithms
(coding + report) (15% each)
– Final exam (50%)
• Hurdle 40%
University of Adelaide 7
mailto:lingqiao. .au
mailto:dong. .au
Prerequisite
• Linear algebra
– Vector, inner product, outer product, norm, Euclidean distance
– Matrix, basic operations (addition, multiplication, inverse)
– Determinant, trace, derivatives
– Eigenvectors and eigenvalue
• Probability theory and Statistics
– Random variable, probability density function
– Mean, variance, covariance matrix
– Statistical independence, conditional probability
– Law of total probability, Bayes rule
– Normal (Gaussian) distribution
University of Adelaide 8
Prerequisite
• Programming skills:
– Python (essential)
– Matlab or other programming languages (optional)
University of Adelaide 9
Course delivery
• Face to face + online (tentative)
• Lectures will be live-streamed and recorded. Recording
will be uploaded to MyUni (Echo360)
• Please check announcement and discussion forum
regularly
– I will check the discussion forum and answer your question every
weekdays (at least once per day)
– If you have urgent questions, please email me
University of Adelaide 10
References
University of Adelaide 11
Outlines
University of Adelaide 12
• Course Introduction
• What is machine learning and its application
• Machine Learning taxonomy and framework
• Mathematic basics in Machine Learning
– Basic algorithmic calculations
– Linear algebra: vector, matrix
– Matrix calculus
– Optimization
– Probability theory
What is Machine Learning?
University of Adelaide
What is Machine Learning?
University of Adelaide
• Data driven vs Expert Systems
– Do not rely on the expert to specify the rules
– Less expensive but more robust
– Quick adapt to new environment
Applications
• Numerous applications
– Image recognition, Speech recognition, Machine translation,
recommendation systems
– Fake image/audio/video generation, automatic music composition
– Drug discovery, Computer-Aided Diagnosis, etc.
– …
• Top 10 Applications of Machine Learning | Machine Learning
Applications & Examples | Simplilearn – YouTube
• A new paradigm in science and engineering
– Learning to predict
– Learning to act
– Learning to generate
University of Adelaide 15
Outlines
University of Adelaide 16
• Course Introduction
• What is machine learning and its application
• Types of Machine Learning systems
• Basic concepts in Machine Learning
• Mathematic basics in Machine Learning
– Basic algorithmic calculations
– Linear algebra: vector, matrix
– Matrix calculus
– Optimization
– Probability theory
Types of Machine Learning systems
University of Adelaide 17
• Lots of categorization perspectives
– From the availability of supervision
– From the methodology
– From the purpose of a machine learning system
– …
• Availability of supervision
– Three main categories:
• Supervised learning
• Unsupervised learning
• Reinforcement learning
– Other hybrid types: Semi-supervised learning and weakly supervised
learning
• Types of the mapping function
– Shallow machine learning
– Deep machine learning
Supervised learning
• In supervised learning, the desired output is provided
and the loss function measures the discrepancy between
the output of mapping function and the true output
• Training dataset
• Loss function
University of Adelaide 18
Example
University of Adelaide 19
Unsupervised learning
University of Adelaide 20
• Learning patterns when no specific target output values
are supplied
• Examples:
– Clustering: group data into groups
– Building probabilistic model to explain data
– Anomaly detection
Reinforcement learning
University of Adelaide
Reinforcement learning
University of Adelaide
Types of machine learning: shallow vs. deep
University of Adelaide 23
• Traditional machine learning
– Important step: feature design
– Usually work with “feature vectors”
– Mapping function is simple, with relatively small number of
parameters
– Works well if the input can be captured by vectors, small to
medium number of samples
• Deep learning
– Allows raw input
– End-to-end learning
– Complex models, with millions of parameters
– Works well if the “right feature” is unknown or the input is
complex and a large number of samples are available
Outlines
University of Adelaide 24
• Course Introduction
• What is machine learning and its application
• Machine Learning taxonomy and framework
• Mathematic basics in Machine Learning
– Basic algorithmic calculations
– Linear algebra: vector, matrix
– Matrix calculus
– Optimization
– Probability theory
The workflow of machine learning systems
University of Adelaide 25
• Problem formulation
– What is the input? What is the expected outcome
• Data collection
– Collect data
– Annotation
• Design machine learning algorithm
– Choose the machine learning model
– Choose the objective function
• Training machine learning model: Learn the decision system
from training data
• Applying machine learning model
Element of machine learning systems
University of Adelaide 26
FX
• Input/Output
– Input: can be feature vectors, text, images, videos, symbolic sequences
– Output: class label, continues value, structured output or a sequences of
actions
• Mapping function
– Map input to the desirable output
– Many possible mappings, e.g., same form but different parameters
• Loss function
– Judge if the mapping function is good enough
Element of machine learning systems
University of Adelaide 27
FX
• Machine learning is a process of finding the optimal
mapping function
Outlines
University of Adelaide 28
• Course Introduction
• What is machine learning and its application
• Machine Learning taxonomy and framework
• Mathematic basics in Machine Learning
– Basic algorithmic calculations
– Linear algebra: vector, matrix
– Matrix calculus
– Optimization
– Probability theory
Summation and Product
• Commonly used operations in Statistic Machine
Learning
• Summation notations
– Summation
– Summation with two indices
University of Adelaide 29
Summation and Product
• A useful formula (a little bit counter-intuitive)
University of Adelaide 30
Summation and Product
• A useful formula (a little bit counter-intuitive)
• Proof
University of Adelaide 31
Linear algebra: vector, matrix and basic matrix
operations
• Vectors and matrix
• Basic operations
– Multiplication
– Transpose
– Inverse
University of Adelaide 32
Matrix multiplication
• View matrix as a set of vectors
University of Adelaide 33
Row vectors
Column vectors
Inner product and norms
• Inner product between two vectors
• Vector Norms
– Measure the length of the vector
– Not unique: could have infinite number of definitions
– Commonly used ones
University of Adelaide 34
Trace and Matrix Norm
• Definition
• Properties
• Frobenius norm
• Relationship to Trace
University of Adelaide 35
Linear Subspace
• For k vectors , all of their linear
combinations form a linear space, i.e.,
• Basis: if are orthogonal to each other
– Equivalent to the coordinate in a space
University of Adelaide 36
Eigen vector and eigen values
• Eigenvalue and Eigenvectors
• Eigen vectors is not unique
– Apply scaling and addition operations will also produce
eigenvectors
– So the eigenvectors corresponding to an eigenvalue form a linear
subspace
– Usually we only interested in a set of independent eigenvectors,
each one will correspond to an eigenvalue
– Modern solver will return a set of eigenvalues and their
corresponding vectors
University of Adelaide 37
Matrix decomposition
• Matrix can be decomposed into the combination (usually
product) of special matrices
• Eigen decomposition
– When A is symmetric, i.e.
• Related topic: Singular value decomposition
University of Adelaide 38
Optimization
• Optimization: find a variable that can gives the minimal
(maximal) value of the objective function
– The variable may under certain constrains, say,
– is called the feasible set of
• In machine learning, we are going to learn a mapping
function
• We will have a loss function or objective function to
measure its performance
University of Adelaide 39
Optimization problem
• General form
• Example
• Could be simple or very difficult, depend on the type of
objective function and the type of constrains
University of Adelaide 40
Equivalence of Optimization problem
• In optimization, we often convert an optimization
problem to another equivalent optimization problem.
– Consider Op1 and Op2, if we know the solution of Op2, we can
know Op1, they can be deemed equivalent.
• Example
University of Adelaide 41
More examples
University of Adelaide 42
Solution to optimization problems
University of Adelaide 43
• General Purposed Solution
– Zero-order method
– Frist-order method
– Second-order method
Solution to optimization problems
• Global Minimum and Local Minimum
• At Local minimum, gradient equals 0.
– If we know an unconstrained optimization has local minimum
= Global Minimum, we can solve to find the optimal
solution
University of Adelaide 44
Type of optimization problems
• Many of them
– Continuous vs. Discrete: binary or Integer variables
– Linear vs. Nonlinear
– Convex vs. nonconvex
• Convex optimization problem
– Global optimum = Local optimum
University of Adelaide 45
Matrix calculus
• For functions that involve matrices or vectors
– Case 1: Vector/Matrix variable and scalar output
– Case 2: Vector/Matrix variable and vector output
• Definition
• Application
– Similar
University of Adelaide 46
Matrix calculus
• Properties
• More info
– Matrix Calculus
• Trick to memorize
– Analogy to scalar case
– Check dimensions
University of Adelaide 47
https://ccrma.stanford.edu/~dattorro/matrixcalc.pdf
Matrix calculus
• More information
• Exercise
• Hint
University of Adelaide 48
Matrix calculus
• More information
• Exercise
• Hint
University of Adelaide 49
Probability and random variable
• Random variable: a way describe the random experiment
outcome
• Probability distribution
– For discrete random variable, its probabilistic distribution is
characterised by Probability Mass Function
– For continuous random variable, the counterpart of Probability
Mass Function is probability density function (PDF)
University of Adelaide 50
Probability and statistics
• Commonly used PDF
– Uniform distribution
– Gaussian distribution
– Multivariate Gaussian distribution
University of Adelaide 51
More than one random variables
• Distribution of a collection of random variables
– Consider the case of two random variables
• Marginal distribution
• Conditional distribution
• Independence
University of Adelaide 52
More than one random variables
• Conditional independence
– Note: conditional independence and independence are two
different concepts
• Bayes rule
University of Adelaide 53
Latent variable
• Sometimes, it is convenient to introduce an additional
random variable and model the joint distribution
• Then the distribution over X can be calculated via
marginalization
• Usually introducing Z is necessary if we know the
generative process of X (How X is sampled)
University of Adelaide 54
Example
– Imagine we have 3 biased dices; the outcome of each dice will be
a random variable with distribution
– We add another layer of randomness by choosing the dice
randomly from a given distribution
– The final outcome will be a random variable Y
University of Adelaide 55
Example
– Imagine we have 3 biased dices; the outcome of each dice will be
a random variable with distribution
– We add another layer of randomness by choosing the dice
randomly from a given distribution
– The final outcome will be a random variable Y
– We can define the choice made in dice selection as an additional
random variable Z
University of Adelaide 56
Expectations and Variance
• Discrete case
• Continuous case
• Variance
University of Adelaide 57