Preliminaries
Who should take this class?
• This is a difficult, math- and programming-intensive class geared primarily towards graduate students
• Historically, much fewer undergraduates manage an A than graduate students
Course Prerequisites
• Linear algebra
• Multivariate Calculus, including partial derivatives
• Probability
• Comfort with programming in Python
• Fundamentals of Data Science (CS 365) is a great pre- requisite for this course
– serves as a preparation including, but not limited, to the courses CS460, CS506, CS542 and CS565
• Intro to Optimization (CAS CS 507)
– is not a formal prerequisite, but is highly recommended before taking this class
Sufficient background Insufficient background
Course Prerequisites • MultivariateCalculus
– Vectors; dot product
– Determinants; cross product
– Matrices; inverse matrices
– Square systems; equations of planes
– Parametric equations for lines and curves
– Max-min problems; least squares
– Second derivative test; boundaries and infinity
– Level curves; partial derivatives; tangent plane approximation
– Differentials; chain rule
– Gradient; directional derivative; tangent plane
– Lagrange multipliers
– Non-independent variables
– Double integrals
– Change of variables
• and other Calculus concepts such as convexity, etc.
Course Prerequisites • Linear algebra
– Vectors and matrices
• Basic Matrix Operations
• Determinants, norms, trace • Special Matrices
– Matrix inverse
– Matrix rank
– Eigenvalues and Eigenvectors – Matrix Calculus
Course Prerequisites • Probability
– Rules of probability, conditional probability and independence, Bayes rule
– Random variables (expected value, variance, their properties); discrete and continuous variables, density functions, vector random variables, covariance, joint distributions
– Common distributions: Normal, Bernoulli, Binomial, Multinomial, Uniform, etc.
A review: http://cs229.stanford.edu/section/cs229-prob.pdf
Course Prerequisites
“..but I really want to take this course!”
• If you lack any of these prerequisites, you SHOULD NOT take this class
• we cannot teach you the class material and also the prerequisite material
• we are not miracle workers!
• instead, please consider these alternative courses: – EC 414 Introduction to Machine Learning
– CS 506 Computational Tools for Data
– CS 504 Data Mechanics
Read the book
Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors • Matrix Calculus
10/2/17 11
Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors • Matrix Calculus
10/2/17 12
Vector
• A column vector where
• A row vector where denotes the transpose operation
10/2/17
13
Vector
• We’ll default to column vectors in this class
10/2/17 14
Matrix
• A matrix is an array of numbers with size m by n, i.e. m rows and n columns.
• If , we say that is square. 10/2/17
15
Basic Matrix Operations
• What you should know: – Addition
– Scaling
– Dot product
– Multiplication
– Transpose
– Inverse / pseudoinverse – Determinant / trace
10/2/17 16
Vectors
• Norm
• More formally, a norm is any function
that satisfies 4 properties:
• Non-negativity:Forall
• Definiteness: f(x) = 0 if and only if x = 0.
• Homogeneity: For all
• Triangle inequality: For all
10/2/17 17
Matrix Operations
• ExampleNorms
• General norms:
10/2/17
18
Matrix Operations
• Inner product (dot product) of vectors
– Multiply corresponding entries of two vectors and add up the result
– x·y is also |x||y|Cos (the angle between x and y)
10/2/17 19
Matrix Operations
• Inner product (dot product) of vectors
– If B is a unit vector, then A·B gives the length of A which lies in the direction of B
10/2/17
20
Matrix Operations • The product of two matrices
10/2/17 21
Matrix Operations • Powers
– By convention, we can refer to the matrix product AA as A2, and AAA as A3, etc.
– Obviously only square matrices can be multiplied that way
10/2/17
22
Matrix Operations
• Transpose – flip matrix, so row 1 becomes column 1
• A useful identity:
10/2/17 23
Matrix Operations • Determinant
– returns a scalar
– Represents area (or volume) of the parallelogram described by the vectors in the rows of the matrix
– For , – Properties:
10/2/17
24
Matrix Operations • Trace
– Invariant to a lot of transformations, so it’s used sometimes in proofs. (Rarely in this class though.)
– Properties:
10/2/17 25
Matrix Operations
• VectorNorms
• Matrix norms: Norms can also be defined for matrices, such as the Frobenius norm:
10/2/17 26
Special Matrices • Symmetric matrix
• Skew-symmetric matrix • IdentitymatrixI
• Diagonal matrix
10/2/17 27
Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors • Matrix Calculate
10/2/17 28
Inverse
• Given a matrix A, its inverse A-1 is a matrix such
that AA-1 = A-1A = I • E.g.
• Inverse does not always exist. If A-1 exists, A is invertible or non-singular. Otherwise, it’s singular.
• Useful identities, for matrices that are invertible: 10/2/17 29
Matrix Operations • Pseudoinverse
– Say you have the matrix equation AX=B, where A and B are known, and you want to solve for X
– You could calculate the inverse and pre-multiply by it: A-1AX=A-1B → X=A-1B
– Python command would be np.linalg.inv(A)*B
– But calculating the inverse for large matrices often brings problems with computer floating-point resolution (because it involves working with very small and very large numbers together).
– Or, your matrix might not even have an inverse. 10/2/17
30
Matrix Operations • Pseudoinverse
– Fortunately, there are workarounds to solve AX=B in these situations. And python can do them!
– Instead of taking an inverse, directly ask python to solve for X in AX=B, by typing np.linalg.solve(A, B)
– Python will try several appropriate numerical methods (including the pseudoinverse if the inverse doesn’t exist)
– Python will return the value of X which solves the equation
• If there is no exact solution, it will return the closest one • If there are many solutions, it will return the smallest one
10/2/17
31
Matrix Operations • Python example:
>> import numpy as np >> x = np.linalg.solve(A,B) x=
1.0000
-0.5000
10/2/17 32
Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors • Matrix Calculate
10/2/17 33
Linear independence
• Suppose we have a set of vectors v1, …, vn
• If we can express v1 as a linear combination of the other vectors v2…vn, then v1 is linearly dependent on the other vectors.
– The direction v1 can be expressed as a combination of the directions v2…vn. (E.g. v1 = .7 v2 -.7 v4)
• If no vector is linearly dependent on the rest of the set, the set is linearly independent.
– Common case: a set of vectors v1, …, vn is always linearly independent if each vector is perpendicular to every other vector (and non-zero)
10/2/17
34
Linear independence
Linearly independent set
Not linearly independent
10/2/17 35
Matrix rank • Column/row rank
– Column rank always equals row rank • Matrix rank
10/2/17 36
Matrix rank
• For transformation matrices, the rank tells you the dimensions of the output
• E.g. if rank of A is 1, then the transformation p’=Ap
maps points onto a line.
• Here’s a matrix with rank 1:
All points get mapped to the line y=2x
10/2/17 37
Matrix rank
• If an m x m matrix is rank m, we say it’s “full rank” – Maps an m x 1 vector uniquely to another m x 1 vector – An inverse matrix can be found
• If rank < m, we say it’s “singular”
– At least one dimension is getting collapsed. No way to
look at the result and tell what the input was – Inverse does not exist
• Inverse also doesn’t exist for non-square matrices 10/2/17 38
Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors(SVD) • Matrix Calculus
10/2/17 39
Eigenvector and Eigenvalue
• An eigenvector x of a linear transformation A is a non-zero vector that, when A is applied to it, does not change direction.
10/2/17 40
Eigenvector and Eigenvalue
• An eigenvector x of a linear transformation A is a non-zero vector that, when A is applied to it, does not change direction.
• Applying A to the eigenvector only scales the eigenvector by the scalar value λ, called an eigenvalue.
10/2/17 41
Properties of eigenvalues
• The trace of a A is equal to the sum of its eigenvalues:
• The determinant of A is equal to the product of its eigenvalues
• The rank of A is equal to the number of non-zero eigenvalues of A.
• The eigenvalues of a diagonal matrix D = diag(d1, . . . dn) are just the diagonal entries d1, . . . dn
10/2/17 42
Diagonalization • Eigenvalue equation:
– Where D is a diagonal matrix of the eigenvalues
10/2/17 43
Diagonalization • Eigenvalue equation:
• Assuming all λi’s are unique:
• Remember that the inverse of an orthogonal matrix is just its transpose and the eigenvectors are orthogonal
10/2/17
44
Symmetric matrices • Properties:
– For a symmetric matrix A, all the eigenvalues are real.
– The eigenvectors of A are orthonormal.
10/2/17 45
Symmetric matrices • Therefore:
– where
• So, if we wanted to find the vector x that:
10/2/17 46
Symmetric matrices • Therefore:
– where
• So, if we wanted to find the vector x that:
– Is the same as finding the eigenvector that corresponds to the largest eigenvalue.
10/2/17
47
Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors(SVD) • Matrix Calculus
10/2/17 48
Matrix Calculus – The Gradient
• Let a function take as input a matrix A of size m × n and returns a real value.
• Then the gradient of f:
10/2/17 49
Matrix Calculus – The Gradient • Every entry in the matrix is:
• the size of ∇Af(A) is always the same as the size of A. So if A is just a vector x:
10/2/17 50
• Example:
Exercise
• Find: 10/2/17
51
• Example:
Exercise
• From this we can conclude that:
10/2/17 52
Matrix Calculus – The Gradient • Properties
10/2/17 53
Matrix Calculus – The Jacobian
10/2/17
54
Matrix Calculus – The Hessian
• The Hessian matrix with respect to x, written or simply as H is the n × n matrix of
partial derivatives
10/2/17 55
Matrix Calculus – The Hessian • Each entry can be written as:
• Exercise: Why is the Hessian always symmetric?
10/2/17 56
Matrix Calculus – The Hessian
• Each entry can be written as:
• The Hessian is always symmetric, because
• This is known as Schwarz's theorem: The order of partial derivatives don’t matter as long as the second derivative exists and is continuous.
10/2/17 57
Matrix Calculus – The Hessian
• Note that the hessian is not the gradient of whole gradient of a vector (this is not defined). It is actually the gradient of every entry of the gradient of the vector.
10/2/17 58
Matrix Calculus – The Hessian • Eg, the first column is the gradient of
10/2/17 59
Common vector derivatives
PSet 1 Out Today
• Due in 1 week: 9/15 11:59pm GMT -5 (Boston Time)
• Diagnostic homework covering topics covered in prereqs
• Additional examples in lab this week (Group A for in-person lab rotations)
Next Class
Supervised Learning I: Regression:
regression, linear hypothesis, SSD cost; gradient descent; normal equations; maximum likelihood;
Reading: Bishop 1.2-1.2.4,3.1-3.1.1