Introduction to information system
Linear Algebra
Gerhard Neumann
School of Computer Science
University of Lincoln
CMP3036M/CMP9063M Data Science
Today‘s Agenda!
• Make you remember Linear Algebra
• Mostly easy but we probably have forgotten it
• Introduction to:
– Vectors
– Matrices
– Matrix Calculus
Revisiting Linear Regression
Why do you hate us???
• Uff… not again math!
– Well, math is important and we can not fully avoid it
– We only cover stuff that we can directly apply to derive our algorithms
– We will do it step by step such that you can really follow
– Who knows… maybe you even like it
• Ok… but why linear algebra?
– Most data is represented as matrix
– Talking about matrix operations is talking about manipulating data!
– Algorithms are often easier to understand in matrix form
– Linear Regression is one of the most basic algorithms for data science!
Ask questions!!!
• Eventhough I am Austrian, I am actually a nice guy…
And give feedback!
Feed the feedbag!
• If it is not clear… tell me!!
• If it is too fast… tell me!!
• If you can not understand „austrian english“ … tell me!
Vectors
• A vector is a multi-dimensional quantity
• Each dimension contains different information (Age, Height, Weight…)
Some notation
• Vectors will always be represented as bold symbols
• A vector is always a column vector
• A transposed vector is always a row vector
What can we do with vectors?
• Multiplication by scalars
• Addition of vectors
Scalar products and length of vectors
• Scalar (Inner) products:
– Sum the element-wise products
• Length of a vector
– Square root of the inner product with itself
Matrices
• A matrix is a rectangular array of numbers arranged in rows and columns.
– is a 3 x 2 matrix and a 2 x 4 matrix
– Dimension of a matrix is always num rows times num columns
– Matrices will be denoted with bold upper-case letters (A,B,W)
– Vectors are special cases of matrices
Matrices in Data Science
• Our data set can be represented as matrix, where single samples are vectors
• Most typical representation:
– Each row represent a data sample (e.g. Joe)
– Each column represents a data entry (e.g. age)
X is a num samples x num entries matrix
What can you do with matrices?
• Multiplication with scalar
• Addition of matrices
• Matrices can also be transposed
Multiplication of a vector with a matrix
• Matrix-Vector Product:
• Think of it as:
– Hence:
– We sum over the columns of W weighted by
• Vector needs to have same dimensionality then number of columns!
Multiplication of a matrix with a matrix
• Matrix-Matrix Product:
• Think of it as:
– Hence: Each column in U can be computed by a matrix-vector product
Multiplication of a matrix with a matrix
• Dimensions:
– Number of columns of left matrix must match number of rows of right matri
• Non-commutative (in general):
• Associative:
• Transpose Product:
Important special cases
• Scalar (Inner) product:
– The scalar product can be written as vector-vector product
Important special cases
• Compute row/column averages of matrix
– Vector of row averages (average over all entries per sample)
– Vector of column averages (average over all samles per entry)
Matrix Inverse
• Definition:
• Unit Element: Identity matrix, e.g., 3 x 3:
• Verify it!
• Note: We can only invert quadratic matrices (num rows = num cols)
scalar matrices
Linear regression models revisited
• In linear regression, the output y is modelled as linear function of the input xi
Effect of 𝛽0 Effect of 𝛽1
Linear regression models in matrix form
• Equation for the i-th sample
• Equation for full data set
– is a vector containing the output for each sample
– is the data-matrix containing a vector of ones as the first
column as bias
Linear regression models in matrix form
• Error vector:
• Sum of squared errors (SSE)
• We have now written the SSE completely in matrix form!
Deriving Linear Regression
• How do we obtain the optimal ? (which minimizes the SSE)
At a minimal value of a
function, its derivative is zero
I.e., find a where
Calculus
Ok, we need to talk about derivatives…
“The derivative of a function of a real variable measures the sensitivity to change of a
quantity (a function value or dependent variable) which is determined by another
quantity (the independent variable)” (Wikipedia)
Function:
Derivative:
Minimum:
Derivatives and Gradients
Function:
Derivative:
Minimum:
scalar vector
is also called the gradient of function f at point x
Matrix Calculus
• How do we compute ?
• We need to know some rules from Matrix Calculus (see wikipedia)
– Linear:
– Quadratic:
– Chain rule:
scalar vector
Derivation of the SSE
How do we compute ?
– Chain rule:
– 1st derivative:
– 2nd derivative:
Putting it together…
• Chain rule:
• Set it to zero: Cancel constant factors
Multiply out brackets
Bring on the other side
(Left-)multiply with the inverse
Transpose on both sides:
General solution to the least-squares problem!