3 On Section 3
3.1 MATLAB and Python: singular decomposition theorem
MATLAB:svd() IfA=U·S·VT,weusethecodesvd(A)tofindU,SandV. Let A = � 1 0 1�. By the singular decomposition theorem, we can write
−2 1 0
where U ∈ O(n), V ∈ O(m) and S is a diagonal matrix.
A = USV We use the following code to find U, S and V .
>> A=[1 0 1; -2 1 0]
>> [U,S,V]=svd(A)
U= 0.4472 -0.8944
0.8944
0.4472
S=
2.4495
0 1.0000
0
0
-0.4082
-0.8165
0.4082
V= 0.9129 -0.3651 0.1826
-0.0000
0.4472
0.8944
0
Python: np.linalg.svd() IfA=U·S·VT,weusethecodesvd(A)tofindU,SandVT. Notice: it is VT not V!
# In [1]
A=np.array([[1, 0, 1],[-2,1,0]])
U,S,W=np.linalg.svd(A)
U
# Out [1]:
array([[-0.4472136 , 0.89442719],
[ 0.89442719, 0.4472136 ]])
32
Here “W” is the transpose of V . # In [2]:
U
# Out [2]:
array([[-0.4472136 , 0.89442719],
[ 0.89442719, 0.4472136 ]])
# In [3]: S
# Out [3]:
array([2.44948974, 1. ])
# In [4]: W
# Out [4]:
array([[-9.12870929e-01, 3.65148372e-01, -1.82574186e-01],
[-3.88578059e-16, 4.47213595e-01, 8.94427191e-01],
[-4.08248290e-01, -8.16496581e-01, 4.08248290e-01]])
Then V=W.transpose() is equal to
array([[−9.12870929e−01, −3.88578059e−16, −4.08248290e−01], [ 3.65148372e−01, 4.47213595e−01, −8.16496581e−01], [−1.82574186e−01, 8.94427191e−01, 4.08248290e−01]])
You may find that the above calculation are different. Why? Let us explain it. In fact, in the MATLAB code, the matrix V
0.9129 V = [v1 v2] = −0.3651
0.1826
which comes from AT A = V DV T where D = diag(6, 1) is the diagonal 3 × 3 matrix. By Python,
it also get from AT A = V DV T
−0.9129 −0.0000 −0.4082 V = [v1v2] = 0.3651 0.4472 −0.8165
−0.1826 0.8944 0.4082 33
−0.0000 0.4472 0.8944
−0.4082 −0.8165
0.4082
Since �
A T A = λ j v j v jT ,
j
any eigenvector vj can be replaced by −vj , this is why causes non-uniqueness. In other words, we
see that the matrices U, S and V in the singular decomposition theorem may not be unique.
3.2 Python: how to form training sets and testing sets randomly?
For a given data set D, we can divide it randomly into two parts: the training set Xtrain, and the test set Xtest. For example, suppose we generated a data set D as in §2.4:
# In [1]:
import sklearn
import pandas
import matplotlib
from sklearn.datasets.samples_generator import make_blobs
from pandas import DataFrame
X, y = make_blobs(n_samples=1000, centers=3, n_features=2, random_state=4)
# In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,y, train_size=0.7,
test_size=0.3, random_state=42)
We obtain the training set Xtrain, which is 70% of D and the test set Xtest, which is 30% of D. 3.3 MATLAB: on the codes in §3.5 of the lecture notes
We will use yellow color box to explain every commander code here:
34
n=50; x=linspace(-3,3,n)’; pix=pi*x;
y=sin(pix)./(pix) + 0.1*x + 0.2*randn(n,1);
x2=x.^2; xx=repmat(x2, 1, n)+repmat(x2’,n,1)-2*x*x’;
hhs=2*[0.03 0.3 3].^2; ls=[0.0001 0.1 100];
m=5; u=mod(randperm(n),m)+1;
for hk=1:length(hhs)
hh=hhs(hk); k=exp(-xx/hh);
for i=1:m
ki=k(u ~ =i, :), kc=k(u= =i,:); yi=y(u ~ =i); yc=y(u= =i);
for lk=1:length(ls)
t=(ki’*ki + ls(lk)*eye(n))\ (ki’*yi);
g(hk, lk, i)=mean((yc-kc*t).^2);
end, end, end
[gl, ggl]=min(mean(g,3), [], 2);
[ghl, gghl]=min(gl);
L=ls(ggl(gghl));
HH=hhs(gghl);
N=1000; X=linspace(-3,3,N)’;
K=exp(-(repmat(X.^2, 1, n)+repmat(x2’, N, 1)-2*X*x’)/HH)
k=exp(-xx/HH); t=(k^2+L*eye(n))\(k*y); F=K*t;
figure(1); clf; hold on; axis([-2.8 2.8 -0.7 1.7]);
plot(X, F, ’g-’); plot(x,y,’bo’);
• The data set D = {(xj,yj)}nj=1 is given where ={xj} are 50 points equally distributed in (−3,3), y = {yj} is given by a function and . Here pi means “π”.
• The Gaussian model design matrix is given by xx . We have explained “repmat” in §2.4.
• The parameter h in the matrix xx = exp(−�xi−xj�2 ) has three choises: 0.03,0.3 and 3. By “hh”,
2h2
it menas h2. By “hhs” , it may mean “h2 selection” which is a vector.
x
n = 50
Similarly, the parameter λ has three choices: 0.0001, 0.1 and 100. By ls , it may mean “lambda section” which is a vector. Here l is the first letter of “lambda.”
• Step 1. Dandomly divide D into 5 subsets Di To do “cross validation”, we randomly 35
divide the 50 elements data-set D = {(xj,yj)} into 5 subsets Di, where 1 ≤ i ≤ 5, i.e., we use m=5.
By identifying xj with j, it suffices to randomly divide the set {1, 2, …, 50} into 5 groups with equal size.
By using a permutation σ of {1,2,…,50}, it suffices to divide the set {σ(1),…,σ(50)} into 5 groups with equal size in a specific way. We can use “randperm” to give {1,2,…,50} a random permutation (see §3.4 below).
For each positive integer σ(j), it is associated with an index uj ∈ {1, 2, 3, 4, 5} given by uj = the remainder of σ(j) + 1.
5
Now we can divide {σ(1),…,σ(50)} into 5 groups: It = {σ(j) | uj = t} for t = 1,2,3,4,5. Here u = (u1, …, u50)T serves as a logical index. We can say that the “fold number” m = 5 . We see #Di =10,and#(D−Di)=40.
• Step 2. Take and fix λ,h and i We should use three “loop” code to take all 2h2,λ and i from all their choices:
for hk=1:length(hhs), for i=1:m and for lk=1:length(ls) with end, end and end .
Here length(X) is equivalent to max(size(X)) for nonempty arrays and 0 for empty arrays.
n = length(X) returns the size of the longest dimension of X. By 1 : m, it means from 1 to m. We have taken and fixed 2h2 ∈ {2 × 0.032, 2 × 0.32, 2 × 32}, λ ∈ {0.0001, 0.2, 100}, and
�n
K(t,xj)θj = K(t)θ, ∀t ∈ Rp, (3) and ��n λ�
θˆh,λ =argmin |yk −fθ,h(xk)|2 + 2�θ�2 (4) θ k=1
36
i ∈ {1,2,3,4,5}.
We have the design matrix k = exp(−�xl−xj�2 ).
2h2
• Step 3. Obtain the optimal parameter θˆ for the model
fθ,h(t) =
j=1
with respect to the data-set D−Di For each i ∈ {1, 2, 3, 4, 5}, we use logic index kc = k(u == i, 🙂 to denote the matrices with the sub-data-set Di = {(xj,yj) | uj = i}, and use logic index
to denote the matrices with the subset {yj | uj = i}. We also use logic in-
dex to denote the complement sub-data-set: D − Di = {(xj , yj ) | uj �= i}, and use logic index to denote the subset {yj | uj �= i}. For “logic” codes, see §3.5
below.
By solving the least square problem (3) and (4), with respect to the data-set D−Di, we obtain the optical parameter θˆ(i) by t = (ki� ∗ ki + ls(lk) ∗ eye(n))\(ki� ∗ yi);
yc = y(u == i)
ki = k(u ∼= i, 🙂
yi = y(u ∼= i)
• Step 4. Computer the validation error
1
For the commander “mean”, see §3.6 below. In fact, the total possibilities of λ, h and i is 3×3×5 = 45, and g consists of five 3 × 3 matrices:
respect to the data-set Di, by
J =
�
|y� −fˆ
j θˆi h λ,h,λ j
#(Di) we calculate the validation error by
(x�j ,yj� )∈Di
i,λ,h
(x�)|2, (5) g(hk, lk, i)=mean((yc-kc * t).ˆ2) .
21 a(i) 31
a(i) a(i) 32 33
Now for any fixed parameters λ, h and i, with
a(i) a(i) 12 13
indicates the three λ parameters in {0.0001, 0.2, 100}.
• Step 5. Find the optimal hˆ and θˆ By “mean (g, 3)” and (6), it means the average for i:
a a a 1�5a(i) 1 �5 a(i) 1�5a(i)
1112133�i11 3�i12 3�i13
a21 a22 a23 = 1 5 a(i) 1 5a(i) 1 5 a(i) 3�i 21 3�i 22 3�i 23 a31 a32 a33 1 5a(i) 1 5a(i) 1 5a(i)
3 i 31 3 i 32 3 i 33
Then by “min(mean(g, 3), [ ], 2 )”, we obtain (for the commander “min”, see §3.7 below)
a1 min{a11, a12, a13}
a2 = min{a21, a22, a23} (7)
a3 min{a31, a32, a33} 37
a(i) 11 a(i)
a(i) a(i), 22 23
i = 1,2,3,4,5. (6) The row indicates three 2h2 parameters in {2 × 0.032, 2 × 0.32, 2 × 32} in order, and the column
Here by “[gl , ggl]=min(mean(g, 3), [ ], 2)” , the “gl” means the matrix (7), and the “ggl” indi- cates, in each row, the row number at where the element is minimum. For example, “ggl” is
1 2 .
1
min{a1, a2, a3}.
Here by “[ghl, gghl]=min(gl)” , “ghl” means the number min{a1, a2, a3}, and the “gghl” indicate
the row number at which the element is minimum.
By “L=ls(ggl(gghl))” , we obtain the optimal λˆ = 0.1. By “HH=hhs(gghl)” , we obtain the
optimal 2hˆ2 = 0.18 = 2 × 0.32.
Finally we use “N=1000” and X to draw the graph of the predictive function fθˆ,λˆ,hˆ with the
optimal parameters, and use “x” and “y” to draw the data-set.
3.4 MATLAB: logical commanders
MATLAB: logical operators The relational operators “> ”, “ < ”, “ >= ”, “ <= ”, “ == ”, “ ∼ ” impose conditions on the array, and you can apply multiple conditions by connecting them with the logical operators “and”, “or”, and “not”, respectively denoted by the symbols “&”, “|”, and “∼”.
[Example] Let us start by creating a 5 × 5 matrix with random integer entries between 1 and 15. Reset the random number generator to the default state for reproducibility. We use the code:
A = randi (15 ,5) to obtain a matrix:
14 5 15 7 1 A = 2 9 15 14 13 .
1415 8 1215 10 15 13 15 11
Use the relational less than operator, <, to determine which elements of A are less than 9. Store the result in B.
38
Then by “min(gl)”, we obtain
13 2 3 3 10
B=A<9 to obtain
0 1 0 1 1 B = 1 0 0 0 0 .
00100 00000
Here “1” means true, and “0” means fails.
Sometimes it is useful to simultaneously change the values of several existing array elements. Use logical indexing with a simple assignment statement to replace the values in an array that meet a condition. We use the following code to replace all values in A that are greater than 10 with the number 10.
A(A>10) = 10 to obtain
10 5 10 7 1 A = 2 9 10 10 10 .
1010 8 1010 10 10 10 10 10
Next, replace all values in A that are not equal to 10 with a NaN value. Here “NaN” means “not a number”.
A(A ̃=10) = NaN
10 NaN NaN NaN 10
10 NaN 10 NaN NaN A=NaNNaN 10 10 10.
10 10 NaN 10 10 10 10 10 10 10
[Example] Let {(xj , yj )}50 be a data set. As we did in Subsection 4.3, for each (xj , yj ), we j=1
associated an extra index uj ∈ {1,2,3,4,5}, i.e., the set {(xj,yj,uj)}50 . Let y = (y1,y2,…,y50)T. j=1
39
0 1 1 1 0
10 2 3 3 10
Then for any fixed integer k ∈ {1, 2, 3, 4, 5}, the following code mean y(u==k)={yj |uj =k}, y(u∼=k)={yj |uj �=k}.
Therefore, by putting condition on the extra index uj, we can divide the original set into several subsets.
3.5 MATLAB: average or mean of array
MATLAB: mean(A)
a matrix
Create a matrix and compute the mean of each column. Let us consider 0 1 1
A = 2 3 2 , 132
422
the code M = mean(A) compute the mean of each column.
M = [1.75, 2.25, 1.75].
MATLAB: mean(A, 2) mean(A,2) creates a matrix and compute the mean of each row.
Consider a matrix
A=�0 1 1� 232
the code M = mean(A, 2) compute the mean of each row. 0.6667
2.3333
MATLAB: mean(A, 3) A matrix A has row dimension and column dimension. If we have several matrices, the family has ”page index”. For example, by the code
A(:,:,1) = [2 4; −2 1]; A(:,:,2) = [9 13; −5 7]; A(:,:,3) = [4 4; 8 −3];
we have three matrices
�2 4�, �9 13�, �4 4�. −2 1 −5 7 8 −3
The code “M1 = mean(A,[1 2])” means the mean over each page of data (rows and columns).
40
M1 = M1(:,:,1) =
1.2500
M1(:,:,2) = 6
M1(:,:,3) = 3.2500
3.6 MATLAB: minimum elements of an array
MATLAB: min(A) For a vector or a matrix, we are able to find the minimum among all elements and to locate this minimum element.
Let A = (2342371552) be a vector. We find that the minimum value among all elements is 15 and find that the 4th element reaches this minimum. The corresponding codes are
A =[23 42 37 15 52]; [b, c]=min(A);
We obtain
>>b b= 15
>>c c=
4
MATLAB: min(A,[ ],k)
the element which reaches the minimum, we can first find and locate the minimum element in each row, which leads a new column vector. Then we find and locate the minimum element in the column vector.
Let A be a matrix
�1 9 −2�. 8 −5 4
41
If we want to find the minimum element in a matrix and to locate
Find the minimum value among all elements on each row give a matrix �−2� and on the first row, −5
the minimum is reached by the 3rd element; on the second row, the minimum is reached by the 2nd element.
Next, for the matrix
�−2� , −5
the minimum is reached on the 2nd row. Finally we find the minimum amomg all elements is −5 and the (2,2) element reaches this minimum.
The corresponding codes are
A =[1 9 −2; 8 −5 4]; [b, c]=min(A,[] , 2); [ d , e ]=min ( b ) ;
Here “2” in min(A, [ ], 2) means finding the minimum along the 2nd dimension (i.e., column dimension) and we obtain
>> b b= −2 −5
>> d d= −5
>> e e=2
3.7 Python: Size of a dataset
Let us obtain a dataset “titanic” from internet. We can view first two rows of this dataset.
# In [1]:
url=’https://tinyurl.com/titanic-csv’
import pandas as pd
42
dataframe=pd.read_csv(url)
dataframe.head(2)
# Out [1]:
Name PClass Age Sex Survived SexCode
0 Allen, Miss Elisabeth Walton 1st 29.0 female 1 1
1 Allison, Miss Helen Loraine 1st 2.0 female 0 1
To find out the size of this dataset, we can do the following
# In [2]:
dataframe.shape
# Out [2]:
(1313, 6) # 1313 rows and 6 columns
As another example, let us obtain a dataset “diabetes” from sklearn, Python:
# In [1]:
import sklearn
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict
diabetes = datasets.load_diabetes()
print(“X”, diabetes.data.shape)
# Out [1]:
X (442, 10)
# In [2]:
print(“y”, diabetes.target.shape)
# Out [2]:
y (442,)
43