程序代写代做代考 chain data science Preliminaries

Preliminaries

Who should take this class?
• This is a difficult, math- and programming-intensive class geared primarily towards graduate students
• Historically, much fewer undergraduates manage an A than graduate students

Course Prerequisites
• Linear algebra
• Multivariate Calculus, including partial derivatives
• Probability
• Comfort with programming in Python
• Fundamentals of Data Science (CS 365) is a great pre- requisite for this course
– serves as a preparation including, but not limited, to the courses CS460, CS506, CS542 and CS565
• Intro to Optimization (CAS CS 507)
– is not a formal prerequisite, but is highly recommended before taking this class

Sufficient background Insufficient background

Course Prerequisites • MultivariateCalculus
– Vectors; dot product
– Determinants; cross product
– Matrices; inverse matrices
– Square systems; equations of planes
– Parametric equations for lines and curves
– Max-min problems; least squares
– Second derivative test; boundaries and infinity
– Level curves; partial derivatives; tangent plane approximation
– Differentials; chain rule
– Gradient; directional derivative; tangent plane
– Lagrange multipliers
– Non-independent variables
– Double integrals
– Change of variables
• and other Calculus concepts such as convexity, etc.

Course Prerequisites • Linear algebra
– Vectors and matrices
• Basic Matrix Operations
• Determinants, norms, trace • Special Matrices
– Matrix inverse
– Matrix rank
– Eigenvalues and Eigenvectors – Matrix Calculus

Course Prerequisites • Probability
– Rules of probability, conditional probability and independence, Bayes rule
– Random variables (expected value, variance, their properties); discrete and continuous variables, density functions, vector random variables, covariance, joint distributions
– Common distributions: Normal, Bernoulli, Binomial, Multinomial, Uniform, etc.
A review: http://cs229.stanford.edu/section/cs229-prob.pdf

Course Prerequisites

“..but I really want to take this course!”
• If you lack any of these prerequisites, you SHOULD NOT take this class
• we cannot teach you the class material and also the prerequisite material
• we are not miracle workers!
• instead, please consider these alternative courses: – EC 414 Introduction to Machine Learning
– CS 506 Computational Tools for Data
– CS 504 Data Mechanics

Read the book

Matrix Algebra Review
• Vectors and matrices
– Basic Matrix Operations
– Determinants, norms, trace – Special Matrices
• Matrix inverse
• Matrix rank
• Eigenvalues and Eigenvectors • Matrix Calculus
10/2/17 11

Vector
• A column vector where
• A row vector where denotes the transpose operation
10/2/17
13

Vector
• We’ll default to column vectors in this class
10/2/17 14

Matrix
• A matrix is an array of numbers with size m by n, i.e. m rows and n columns.
• If , we say that is square. 10/2/17
15

Basic Matrix Operations
• What you should know: – Addition
– Scaling
– Dot product
– Multiplication
– Transpose
– Inverse / pseudoinverse – Determinant / trace
10/2/17 16

Vectors
• Norm
• More formally, a norm is any function
that satisfies 4 properties:
• Non-negativity:Forall
• Definiteness: f(x) = 0 if and only if x = 0.
• Homogeneity: For all
• Triangle inequality: For all
10/2/17 17

Matrix Operations
• ExampleNorms
• General norms:
10/2/17
18

Matrix Operations
• Inner product (dot product) of vectors
– Multiply corresponding entries of two vectors and add up the result
– x·y is also |x||y|Cos (the angle between x and y)
10/2/17 19

Matrix Operations
• Inner product (dot product) of vectors
– If B is a unit vector, then A·B gives the length of A which lies in the direction of B
10/2/17
20

Matrix Operations • The product of two matrices
10/2/17 21

Matrix Operations • Powers
– By convention, we can refer to the matrix product AA as A2, and AAA as A3, etc.
– Obviously only square matrices can be multiplied that way
10/2/17
22

Matrix Operations
• Transpose – flip matrix, so row 1 becomes column 1
• A useful identity:
10/2/17 23

Matrix Operations • Determinant
– returns a scalar
– Represents area (or volume) of the parallelogram described by the vectors in the rows of the matrix
– For , – Properties:
10/2/17
24

Matrix Operations • Trace
– Invariant to a lot of transformations, so it’s used sometimes in proofs. (Rarely in this class though.)
– Properties:
10/2/17 25

Matrix Operations
• VectorNorms
• Matrix norms: Norms can also be defined for matrices, such as the Frobenius norm:
10/2/17 26

Special Matrices • Symmetric matrix
• Skew-symmetric matrix • IdentitymatrixI
• Diagonal matrix
10/2/17 27

Inverse
• Given a matrix A, its inverse A-1 is a matrix such
that AA-1 = A-1A = I • E.g.
• Inverse does not always exist. If A-1 exists, A is invertible or non-singular. Otherwise, it’s singular.
• Useful identities, for matrices that are invertible: 10/2/17 29

Matrix Operations • Pseudoinverse
– Say you have the matrix equation AX=B, where A and B are known, and you want to solve for X
– You could calculate the inverse and pre-multiply by it: A-1AX=A-1B → X=A-1B
– Python command would be np.linalg.inv(A)*B
– But calculating the inverse for large matrices often brings problems with computer floating-point resolution (because it involves working with very small and very large numbers together).
– Or, your matrix might not even have an inverse. 10/2/17
30

Matrix Operations • Pseudoinverse
– Fortunately, there are workarounds to solve AX=B in these situations. And python can do them!
– Instead of taking an inverse, directly ask python to solve for X in AX=B, by typing np.linalg.solve(A, B)
– Python will try several appropriate numerical methods (including the pseudoinverse if the inverse doesn’t exist)
– Python will return the value of X which solves the equation
• If there is no exact solution, it will return the closest one • If there are many solutions, it will return the smallest one
10/2/17
31

Matrix Operations • Python example:
>> import numpy as np >> x = np.linalg.solve(A,B) x=
1.0000
-0.5000
10/2/17 32

Linear independence
• Suppose we have a set of vectors v1, …, vn
• If we can express v1 as a linear combination of the other vectors v2…vn, then v1 is linearly dependent on the other vectors.
– The direction v1 can be expressed as a combination of the directions v2…vn. (E.g. v1 = .7 v2 -.7 v4)
• If no vector is linearly dependent on the rest of the set, the set is linearly independent.
– Common case: a set of vectors v1, …, vn is always linearly independent if each vector is perpendicular to every other vector (and non-zero)
10/2/17
34

Linear independence
Linearly independent set
Not linearly independent
10/2/17 35

Matrix rank • Column/row rank
– Column rank always equals row rank • Matrix rank
10/2/17 36

Matrix rank
• For transformation matrices, the rank tells you the dimensions of the output
• E.g. if rank of A is 1, then the transformation p’=Ap
maps points onto a line.
• Here’s a matrix with rank 1:
All points get mapped to the line y=2x
10/2/17 37

Matrix rank
• If an m x m matrix is rank m, we say it’s “full rank” – Maps an m x 1 vector uniquely to another m x 1 vector – An inverse matrix can be found
• If rank < m, we say it’s “singular” – At least one dimension is getting collapsed. No way to look at the result and tell what the input was – Inverse does not exist • Inverse also doesn’t exist for non-square matrices 10/2/17 38 Matrix Algebra Review • Vectors and matrices – Basic Matrix Operations – Determinants, norms, trace – Special Matrices • Matrix inverse • Matrix rank • Eigenvalues and Eigenvectors(SVD) • Matrix Calculus 10/2/17 39 Eigenvector and Eigenvalue • An eigenvector x of a linear transformation A is a non-zero vector that, when A is applied to it, does not change direction. 10/2/17 40 Eigenvector and Eigenvalue • An eigenvector x of a linear transformation A is a non-zero vector that, when A is applied to it, does not change direction. • Applying A to the eigenvector only scales the eigenvector by the scalar value λ, called an eigenvalue. 10/2/17 41 Properties of eigenvalues • The trace of a A is equal to the sum of its eigenvalues: • The determinant of A is equal to the product of its eigenvalues • The rank of A is equal to the number of non-zero eigenvalues of A. • The eigenvalues of a diagonal matrix D = diag(d1, . . . dn) are just the diagonal entries d1, . . . dn 10/2/17 42 Diagonalization • Eigenvalue equation: – Where D is a diagonal matrix of the eigenvalues 10/2/17 43 Diagonalization • Eigenvalue equation: • Assuming all λi’s are unique: • Remember that the inverse of an orthogonal matrix is just its transpose and the eigenvectors are orthogonal 10/2/17 44 Symmetric matrices • Properties: – For a symmetric matrix A, all the eigenvalues are real. – The eigenvectors of A are orthonormal. 10/2/17 45 Symmetric matrices • Therefore: – where • So, if we wanted to find the vector x that: 10/2/17 46 Symmetric matrices • Therefore: – where • So, if we wanted to find the vector x that: – Is the same as finding the eigenvector that corresponds to the largest eigenvalue. 10/2/17 47 Matrix Algebra Review • Vectors and matrices – Basic Matrix Operations – Determinants, norms, trace – Special Matrices • Matrix inverse • Matrix rank • Eigenvalues and Eigenvectors(SVD) • Matrix Calculus 10/2/17 48 Matrix Calculus – The Gradient • Let a function take as input a matrix A of size m × n and returns a real value. • Then the gradient of f: 10/2/17 49 Matrix Calculus – The Gradient • Every entry in the matrix is: • the size of ∇Af(A) is always the same as the size of A. So if A is just a vector x: 10/2/17 50 • Example: Exercise • Find: 10/2/17 51 • Example: Exercise • From this we can conclude that: 10/2/17 52 Matrix Calculus – The Gradient • Properties 10/2/17 53 Matrix Calculus – The Jacobian 10/2/17 54 Matrix Calculus – The Hessian • The Hessian matrix with respect to x, written or simply as H is the n × n matrix of partial derivatives 10/2/17 55 Matrix Calculus – The Hessian • Each entry can be written as: • Exercise: Why is the Hessian always symmetric? 10/2/17 56 Matrix Calculus – The Hessian • Each entry can be written as: • The Hessian is always symmetric, because • This is known as Schwarz's theorem: The order of partial derivatives don’t matter as long as the second derivative exists and is continuous. 10/2/17 57 Matrix Calculus – The Hessian • Note that the hessian is not the gradient of whole gradient of a vector (this is not defined). It is actually the gradient of every entry of the gradient of the vector. 10/2/17 58 Matrix Calculus – The Hessian • Eg, the first column is the gradient of 10/2/17 59 Common vector derivatives PSet 1 Out Today • Due in 1 week: 9/15 11:59pm GMT -5 (Boston Time) • Diagnostic homework covering topics covered in prereqs • Additional examples in lab this week (Group A for in-person lab rotations) Next Class Supervised Learning I: Regression: regression, linear hypothesis, SSD cost; gradient descent; normal equations; maximum likelihood; Reading: Bishop 1.2-1.2.4,3.1-3.1.1

Related Posts