CS计算机代考程序代写 Linear classification

Linear classification

Little matrix magic

What is this fuzz about matrix computation and linear algebra? Let’s approach it by asking

how would think to solve linear equations of higher order polynomials. For example, second

order,

or third order,

polynomials. Solving these for the minimum number of points, three and four, is complex, but

how about for points?

Let’s first change our naming convention to make sure that we don’t run out of letters. Let’s

use the term , “weight”, and use convention where is the constant of polynomialm is

the first order weight and so on. Now, we have the following notations for the first, second and

third order polynomials:

We may conviniently define the inputs and outputs as vectors (denoted by bold symbols) and

matrices (denoted by capital bold symbols)

when the group of all linear equations that must hold is simply put as:

The MSE computation with matrices is also pretty straightforward:

Solving the weight vector of the above equation is simpler than without the matrix form. We

just compute the gradient zero points as before:

y = ax2 + bx + c , (1)

y = ax3 + bx2 + cx + d , (2)

N

w w0 w1

y = w0 + w1x,
y = w0 + w1x + w2x

2 and
y = w0 + w1x + w2x

2 + w3x
3

(3)

y =

⎛⎜⎜⎜⎜⎝

y1

y2


yN

⎞⎟⎟⎟⎟⎠
, X =

⎛⎜⎜⎜⎜⎝

1 x1
1 x2


1 xN

⎞⎟⎟⎟⎟⎠
,  ja w = ( b

a
) , (4)

y = Xw . (5)

ϵMSE =
N


i=1

(yi − ŷ i)
2 = (y − ŷ)T (y − ŷ) = (y − Xw)T (y − Xw)

= (yT y − yT Xw − (Xw)T y + (Xw)T Xw)

= (yT y − 2yT Xw + wT XT Xw) .

(6)

1
N

1
N

1
N

1
N
1
N

w

lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi…

1 of 13 9/6/21, 16:15

Now, let’s test how it works!

∇ϵMSE = (− yT X + wT XT X) = 0T ⇒ wT = yT X(XT X)−1 (7)2
N

2
N

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from numpy.linalg import inv

#
# Let’s generate some y=ax+b data with noise
np.random.seed(42) # to always get the same points
N = 50
x = np.random.normal(1.1,0.3,N)
a_gt = 50.0
b_gt = 20.0
y_noise = np.random.normal(0,8,N) # Measurements from the class 1\n”,
y = a_gt*x+b_gt+y_noise

#
# Let’s do LSQfit using the matrix form

# Form X and y
X = np.concatenate((np.transpose([np.ones(N)]),np.transpose([x])), axis=1)
w_foo = np.matmul(np.transpose(X),X)
w_foo = inv(w_foo)
w = np.matmul(np.matmul(y,X),w_foo)

a = w[1]
b = w[0]
y_h = a*x+b
MSE = np.sum((y-y_h)**2)/N

# Coodinate system
plt.plot(x,y,’bo’)
x_plot = np.linspace(0,2.0,10)
plt.plot(x_plot,a*x_plot+b,’b-‘)
plt.title(f”Fitted line (w1=a={a:.1f}, w0=b={b:.1f}, MSE={MSE:.1f})”)
plt.xlabel(‘height [m]’)
plt.ylabel(‘weight [kg]’)
plt.axis([0,2,0,150])
plt.show()

lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi…

2 of 13 9/6/21, 16:15

In [2]:
#
# Let’s now generate second order data with noise

# Second order polynomial
x = np.linspace(-1.0,+1.0,N)
w0 = -5
w1 = 2
w2 = 5
y_noise = np.random.normal(0,0.25,N)
y=w0+w1*x+w2*x**2+y_noise
plt.plot(x,y,’bo’)
plt.title(f”Noisy points of y=5x^2+2x-5″)
#plt.axis([-1,+1,-11,-7])
plt.show()

# Form X and y
X = np.concatenate((np.transpose([np.ones(N)]),np.transpose([x]), np.transpose
w_foo = np.matmul(np.transpose(X),X)
w_foo = inv(w_foo)
w = np.squeeze(np.matmul(np.matmul(y,X),w_foo))
y_h = w[0]+w[1]*x+w[2]*x**2
MSE = np.sum((y-y_h)**2)/N
plt.plot(x,y,’bo’)
plt.plot(x,y_h,’b-‘)
plt.title(f”Fitted polynomial (w0={w[0]:.2f}, w1={w[1]:.2f}, w2={w[2]:.2f}, MSE=
#plt.axis([-1,+1,-11,-7])
plt.show()

lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi…

3 of 13 9/6/21, 16:15

Magical, we can fit polynomials of any order with this super simple equation that solves the

weights of the polynomial!

Classification problem

Classification is the problem that finally defined the research field of pattern recognition and

machine learning. Unlike in regression, where the output is a continous variable, , the

output is now discrete, , where the different discrete values

represent different classes (e.g. “airplane”, “dog”).

wi

y ∈ R
y ∈ C C = {c1, c2, … , cK}

K

lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi…

4 of 13 9/6/21, 16:15

Example 3.1 There is a ML expert orc in Middle-earth who wants to make a machine learning

based trap that catches hobits, that orcs like to eat, but not elves, that orcs are scared off.

Orcs have a height sensor and they want to make a classifier that classifies entering

creatures in a forest near Bree to two classes, , where and

.

The most traditional classifier is the nearest neighbor rule (1-NN), but here we instead try to

formulate the problem as line fitting. Let’s agree that hobits are and elves

. Furthermore, we define the classification rule:

y = f(x) y = {hobit, elf}
x : height in meters

In [3]:
import matplotlib.pyplot as plt
import numpy as np

# Generate random points for training
np.random.seed(13) # to always get the same points
N = 5
x_h = np.random.normal(1.1,0.3,N)
x_e = np.random.normal(1.9,0.4,N)
plt.plot(x_h,np.zeros([N,1]),’co’, label=”hobit”)
plt.plot(x_e,np.zeros([N,1]),’mo’, label=”elf”)
plt.title(‘Training samples from two categories’)
plt.legend()
plt.xlabel(‘height [m]’)
plt.axis([0.5,2.5,-1.1,+1.1])
plt.show()

y = −1.0
y = +1.0

class  =   { hobit,  if y < 0 elf,  if y > 0

(8)

lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi…

5 of 13 9/6/21, 16:15

In [4]:
from numpy.linalg import inv

#
# 2. Add y values that represent the two classes and plot
y_h = np.zeros(N)
y_h[:] = -1.0
y_e = np.zeros(N)
y_e[:] = +1.0
plt.plot(x_h,y_h,’co’, label=”hobit”)
plt.plot(x_e,y_e,’mo’, label=”elf”)
plt.title(‘Training samples from two classes (c1 = -1, c2 = +1)’)
plt.legend()
plt.xlabel(‘height [m]’)
plt.axis([0.5,2.5,-1.1,+1.1])
plt.show()

#
# 3. Fit and plot line

# Form the train input and output vectors (1: hobit, 2: elf)
x_tr = np.concatenate((x_h,x_e))
y_tr = np.concatenate((y_h,y_e))

# Matrix operations
X = np.concatenate((np.transpose([np.ones(2*N)]),np.transpose([x_tr])), axis
w_foo = np.matmul(np.transpose(X),X)
w_foo = inv(w_foo)
w = np.matmul(np.matmul(y_tr,X),w_foo)

a = w[1]
b = w[0]
y_hat_tr = a*x_tr+b
MSE = np.sum((y_tr-y_hat_tr)**2)/N

# Coodinate system
plt.plot(x_h,y_h,’co’, label=”hobit”)
plt.plot(x_e,y_e,’mo’, label=”elf”)
plt.legend()
plt.plot(x_tr,a*x_tr+b,’b-‘)
plt.title(f”Fitted line (a={a:.1f}, b={b:.1f}, MSE={MSE:.1f})”)
plt.xlabel(‘height [m]’)
plt.axis([0.5,2.5,-1.1,+1.1])
plt.show()

lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi…

6 of 13 9/6/21, 16:15

However, this model has problems. Classification error is discrete and should not depend on

how far on the correct or wrong side a training point is. Instead of the linear value

we want to have a Heaviside step function

However, that function is mathematically difficult and thus we use its approximation,

called as the “sigmoid function”. Now, our linear model is inside the sigmoid function as

y = w0 + w1x

f(x) = H(x) = { 1, jos x > 0
0, jos x < 0 . (9) f(x) = logsig(x) = , (10) 1 1 + e−x f(x) = logsig(ax + b) = . (11) 1 1 + e−(w0+w1x) In [5]: # Coodinate system plt.xlabel('x') plt.ylabel('y') #plt.axis([0.5,4.0,-1.1,+1.1]) # Plot logsig N = 101 x = np.linspace(-3.0,+3.0,N) plt.plot(x, 1/(1+np.exp(-x)),'c-', label="logsig(x)") plt.plot(x, 1/(1+np.exp(-2*x)),'m-', label="logsig(2x)") plt.plot(x, 1/(1+np.exp(-5*x)),'y-', label="logsig(5x)") plt.plot(x, 1/(1+np.exp(-5*x+5)),'k-', label="logsig(5x-5))") plt.title('Logistic function') plt.legend() plt.show() lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 7 of 13 9/6/21, 16:15 In [ ]: lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 8 of 13 9/6/21, 16:15 In [7]: from scipy.special import expit # # 1. Generate and plot random points for training np.random.seed(13) # to always get the same points N = 5 x_h = np.random.normal(1.1,0.3,N) x_e = np.random.normal(1.9,0.4,N) x_e = np.append(x_e, [4.9]) # add giant elf plt.plot(x_h,np.zeros([N,1]),'co', label="hobitti") plt.plot(x_e,np.zeros([N+1,1]),'mo', label="haltija") plt.title('Training samples from two classes (+outlier)') plt.legend() plt.xlabel('height [m]') plt.axis([0.5,5.0,-1.1,+1.1]) plt.show() # # 2. Add y values that represent the two classes and plot y_h = np.zeros(N) y_h[:] = 0.0 y_e = np.zeros(N+1) y_e[:] = +1.0 plt.plot(x_h,y_h,'co', label="hobit") plt.plot(x_e,y_e,'mo', label="elf") plt.title('Näytteitä kahdesta luokasta (c1 = -1, c2 = +1)') plt.legend() plt.xlabel('height [m]') plt.axis([0.5,5.0,-0.1,+1.1]) plt.show() # # 3. Fit and plot line after K epochs # Form the train input and output vectors (1: hobit, 2: elf) x_tr = np.concatenate((x_h,x_e)) y_tr = np.concatenate((y_h,y_e)) # Initialize gradient descent a_t = 0 b_t = 0 num_of_epochs = 100 learning_rate = 0.5 # Plot before training y_pred = expit(a_t*x_tr+b_t) MSE = np.sum((y_tr-y_pred)**2)/(N+1) plt.title(f'Epoch=0 a={a_t:.2f} b={b_t:.2f} MSE={MSE:.2f}') plt.plot(x_h,y_h,'co', label="hobit") plt.plot(x_e,y_e,'mo', label="elf") x = np.linspace(0.0,+5.0,50) plt.plot(x,expit(a_t*x+b_t),'b-') plt.xlabel('height [m]') plt.axis([0.5,5.0,-0.1,+1.1]) plt.show() for e in range(1,num_of_epochs): grad_a = np.sum(2*x_tr*(y_tr-expit(a_t*x_tr+b_t))*expit(a_t*x_tr+b_t)*( grad_b = np.sum(2*(y_tr-expit(a_t*x_tr+b_t))*expit(a_t*x_tr+b_t)*(-1+expit lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 9 of 13 9/6/21, 16:15 plt.plot(x_h,y_h,'co', label="hobit") plt.plot(x_e,y_e,'mo', label="elf") x = np.linspace(0.0,+5.0,50) plt.plot(x,expit(a_t*x+b_t),'b-') plt.xlabel('height [m]') plt.axis([0.5,5.0,-0.1,+1.1]) plt.show() # Plot after training y_pred = expit(a_t*x_tr+b_t) MSE = np.sum((y_tr-y_pred)**2)/(N+1) plt.title(f'Epoch={num_of_epochs} a={a_t:.2f} b={b_t:.2f} MSE={MSE:.2f}') plt.plot(x_h,y_h,'co', label="hobit") plt.plot(x_e,y_e,'mo', label="elf") x = np.linspace(0.0,+5.0,50) plt.plot(x,expit(a_t*x+b_t),'b-') plt.xlabel('height [m]') plt.axis([0.5,5.0,-0.1,+1.1]) plt.show() a = a_t b = b_t lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 10 of 13 9/6/21, 16:15 lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 11 of 13 9/6/21, 16:15 Indeed, logsig nonlinearity and the gradient descent work! Actually this very same idea is lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 12 of 13 9/6/21, 16:15 inside the modern deep neural networks! References C.M. Bishop (2006): Pattern Recognition and Machine Learning, Chapter 4. lec03_linear_classification http://localhost:8888/nbconvert/html/lec03_linear_classi... 13 of 13 9/6/21, 16:15