4 On Section 4
4.1 MATLAB: Uniformly distributed random numbers
“rand” is used to return a single uniformly distributed random number in the interval (0, 1).
[Example]
r=rand(5)
ans =
0.8147 0.0975
0.9058 0.2785
0.1270 0.5469
0.9134 0.9575
0.6324 0.9649
0.1576 0.1419
0.9706 0.4218
0.9572 0.9157
0.4854 0.7922
0.8003 0.9595
0.6557
0.0357
0.8491
0.9340
0.6787
[Example]
r=rand(5, 2)
ans =
We can generate a 5 × 2 matrix with random numbers in (0, 1).
0.7577 0.7060
0.7431 0.0318
0.3922 0.2769
0.6555 0.0462
0.1712 0.0971
We can generate a 5 × 5 matrix with random numbers in (0, 1).
We can randomly generate 1000 points in the interval (50, 100). r=(b-a).*rand(10000,1)+a;
44
[Example]
a=50; b=100;
4.2 MATLAB: Normal random numbers
Generate a single random value from the standard normal distribution. In MATLAB, we use “r = normrnd(mu,sigma)” generates a random number from the normal distribution
1 −1(x−μ)2 f(x)=σ√2πe 2 σ
with mean parameter μ and standard deviation parameter σ.
[Example] Generate a single random value from the standard normal distribution.
r = normrnd(0,1);
r= 1.4247
[Example] Create a 5×2 matrix of normal random numbers from the normal distribution with mean 3 and standard deviation 10.
r=normrnd(3,10, [5,2]);
r=
-3.3142 20.0559
14.7968 16.8086
10.7641 -0.4305
9.8952 2.5978
16.7239 -2.0681
4.3 Digits
In Python, scikit-learn comes with a few standard datasets, for instance the iris and digits datasets for classification and the boston house prices dataset for regression. We can use them for our study and practice.
The MNIST database (i.e., The Mixed National Institute of Standards and Technology database) of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. The digits were sampled from documents written by employees of the US Census Bureau and American high school students. The images are grayscale and 28 × 28 pixels in dimension. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
45
Figure 10: MNIST database
“Digits” is a simplified version of the MNIST dataset, and is a data set of around 1800 samples of size 8×8 from the UCI ML dataset, which is preprocessed version with some modification.
Download the digits data-set
#In [1]:
import sklearn
from sklearn import datasets
digits=datasets.load_digits()
#In [2]:
print(digits.keys())
#Out [2]:
dict_keys([’data’, ’target’, ’target_names’, ’images’, ’DESCR’])
We can separate the data part, and the target part.
#In [3]:
X,y=digits[’data’], digits[’target’]
Know the size of the data By the following codes: #In [4]:
46
print(“Image Data Shape” , digits.data.shape)
print(“Label Data Shape”, digits.target.shape)
#Out [4]:
Image Data Shape (1797, 64)
Label Data Shape (1797, 64)
it tells us that the digits data set has 1797 samples with each a 8 × 8 = 64 matrix. We can see all samples:
#In [5]:
print(X)
#Out [5]:
[[0. 0. 5…. 0. 0. 0.]
[0. 0.
[0. 0.
…
[0. 0.
[0. 0.
[ 0. 0. 10. … 12. 1. 0.]]
#In [6]:
print(y)
#Out [6]:
[0 1 2 … 8 9 8]
It show the first one and the last one. We can view any one of them. For example, the first one, and the second one.
#In [7]:
print(X[0])
print(y[0])
#Out [7]:
[0. 0. 5.13. 9. 1. 0. 0. 0. 0.13.15.10.15. 5. 0. 0. 3. 15. 2. 0. 11. 8. 0. 0. 4. 12. 0. 0. 8. 8. 0. 0. 5. 8. 0. 0. 9. 8. 0. 0. 4.11. 0. 1.12. 7. 0. 0. 2.14. 5.10.12. 0. 0. 0. 0. 6.13.10. 0. 0. 0.]
47
0….10. 0. 0.] 0….16. 9. 0.]
1…. 6. 0. 0.] 2….12. 0. 0.]
0
#In [8]:
print(X[1])
print(y[1])
#Out[8]:
[0. 0. 0.12.13. 5. 0. 0. 0. 0. 0.11.16. 9. 0. 0. 0. 0. 3.15.16. 6. 0. 0. 0. 7.15.16.16. 2. 0. 0. 0. 0. 1.16. 16. 3. 0. 0. 0. 0. 1.16.16. 6. 0. 0. 0. 0. 1.16.16. 6. 0. 0. 0. 0. 0.11.16.10. 0. 0.]
1
Visualize the image For each matrix, we can visualize it as a picture. For example, let us consider the first, second and third such matrices in the data part.
#In [9]:
import matplotlib.pyplot as plt
digit = X[0]
digit_pixels = digit.reshape(8, 8)
plt.subplot(131)
plt.imshow(digit_pixels)
digit = X[1]
digit_pixels = digit.reshape(8, 8)
plt.subplot(132)
plt.imshow(digit_pixels)
digit = X[2]
digit_pixels = digit.reshape(8, 8)
plt.subplot(133)
plt.imshow(digit_pixels)
We reshaped the images from an arrays to 8 × 8 matrices.
Set up training set and testing set We can divide the data set X into the training set (90%
of the X) and testing set 10% of the X). Similarly for the target set y. 48
Figure 11: The images of the first, second and third samples
#In [10]:
X_train, X_test, y_train, y_test = X[:1620], X[1620:], y[:1620], y[1620:]
print(’Train Data: ’, X_train, ’\n’, ’Test Data:’, X_test, ’\n’,
’Train label: ’, y_train, ’\n’, ’Test Label: ’, y_test)
#Out [10]:
Train Data: [[ 0. 0. 5. … 0. 0. 0.]
[0. 0. 0….10. 0. 0.]
[0. 0. 0….16. 9. 0.]
…
[0. 0. 5…. 1. 0. 0.]
[0. 0. 6…. 9. 6. 2.]
[0. 0. 0…. 6. 0. 0.]]
Test Data: [[ 0. 0. 4. … 10. 1. 0.]
[0. 0. 0…. 0. 0. 0.]
[0. 0. 7…. 0. 0. 0.]
…
[0. 0. 1…. 6. 0. 0.]
[0. 0. 2….12. 0. 0.]
[ 0. 0. 10. … 12. 1. 0.]]
Trainlabel: [012…528]
TestLabel: [0176321746313917684314053696175447225 7954490898012345678901234567890123456 7890955650989841773510022782012633733 4666491509528200176321746313917684314
0 5 3 6 9 6 1 7 5 4 4 7 2 8 2 2 5 7 9 5 4 8 8 4 9 0 8 9 8]
We have about 90% of the data set for training set, and 10% of the data set for testing set. Create URL Show two rows,
49