Model Recognition Orientated at Small Sample Data
Jun-ling Yang, Yan-ling Luo and Ying-ying Su
Abstract System modeling is a prerequisite to understand object properties, while the chances are that an industrial site may come along with restricted conditions, lead- ing to less experimental data acquired about the objects. In this case, the application of traditional statistical law of large numbers for modeling certainly will influence the identification precision. Aiming at the problems in recognition of small samples being not that high in precision, it is proposed to introduce the bootstrap-based re- sampling technique, upon which the original small sample data are expanded in order to meet the requirements of the statistical recognition method for sample quantity, so as to meet the requirements of precision. The simulation results showed that the extended sample model recognition accuracy is substantially higher than that of the original small sample. This illustrates the validity of the bootstrap-based re-sampling technique, working as an effective way for small sample data processing.
Keywords Small sample · Bootstrap method · Expansion · Data processing 1 Introduction
As a spurt of progress has been made in the development of information technique, data modeling is getting increasingly important for us to research the method of complex object. Abnormalities in certain condition, however, are complicated by the industrial site condition that may result in fewer experimental data about the object. Concerning the research of certain system objects, less number of process samples may not be able to bring out accurate judgment and analysis for object models. A better application of the law of large numbers in the typical statistical recognition
J. Yang (B) · Y. Luo · Y. Su
School of Electric and Information Engineering, Chongqing University of Science and Technology, Chongqing 401331, China
e-mail: junlingyang@126.com
B.-Y. Cao and H. Nasseri (eds.), Fuzzy Information & Engineering and Operations 307 Research & Management, Advances in Intelligent Systems and Computing 211,
DOI: 10.1007/978-3-642-38667-1_30, © Springer-Verlag Berlin Heidelberg 2014
308 J. Yang et al.
method contributes to the establishment of small sample and improved recognition accuracy. To do this, the research on the data processing method for small sample data extension is viewed as a very important issue that requires special attention.
There are many commonly used methods being applied to look into the small sample issue in industry, such as Bayes, Bootstrap and Bayes Bootstrap, BP neural network, Monte Carlo, as well as the gray system, support vector machine, the fuzzy analysis and physical simulation [1–5]. Bayes is a combination of a priori knowl- edge of the system and the existing knowledge that conducts statistical inference on the future actions of the system. It is a method proven to be effective in esti- mating parameters in conformity with both the normal distribution and non-normal distribution, thus being taken as a more reasonable statistical analysis method for small samples or single simulation running. As one of the commonly used statisti- cal inference methods, Bootstrap features non-priori and merely requires the actual observation data in the calculation process, it is quite convenient when used in the actual processing of the data, and also available for validation approximate con- vergence of the data model. Relative to other methods rooted from the weighted concept, this non-parametric statistical method does not involve any assumptions about the distribution of the unknown population. Rather, it is performed by means of the computer to complete the sampling of the original data, able to transform the small sample issue into a large sample one to simulate unknown distribution. The Bootstrap-based re-sampling technique works with computers in simulation to replace the deviation, variance, and other statistics as a complex and not-so-accurate approximate analysis method. Thus, the Bootstrap method for conducting statistical inference on small samples is mainly used in the probabilistic model of unknown parameters with too complex derivation being theoretically infeasible. It may also be found present in optimizing the inference effect with inaccurate statistical models or adequate statistical information.
In summary, this paper presents a bootstrap re-sampling technique for the expan- sion of sample data, aiming to build a relatively large number of virtual samples. On this basis, the use of the research methods for the traditional model recognition may improve the accuracy of object recognition.
2 Bootstrap-Based Re-sampling Technique for Small Sample Expansion
2.1 Principle
Let us set an unknown small sample population x(n) = (x1,x2,…,xn), n is smaller, and then get the unknown parameter estimates, such as the expectation of this unknown population (also known as the Mean). Traditionally, the mean of
such n samples 1 n xi is used to estimate it. Yet with a very small n, such an n i=1
Model Recognition Orientated at Small Sample Data 309
Table 1 Distribution law of discrete sample
Value x1 x1 x2
Probability 1/n 1/n
… xn … 1/n
expected value (mean) of unknown distribution shows a poor effect. Here, Bootstrap provides an effective solution, which is completed in the following steps:
Step 1: Collect the small samples (x1,x2 …,xn), here n is smaller.
Step 2: Get the empirical distribution functions of this sample, which are discrete, and then can be written by the following distribution law, as shown in Table 1.
Step 3: Select n samples from this empirical distribution. According to the above
distribution law, the successful selection of xi , i = 1, 2, . . . , n has an average prob-
ability of 1 . And then there comes the successful selection of the following set of n
samples(x1,x1…,x1),i.e.allisx1,or(xn,xn …,xn);allisxn,or(x1,x1…,xn), that means there is individual repeat, but the total number of samples is n. In short, the taken samples can be any combination of the original samples. This is because the distribution law has every value to be shown in probability as 1 . Here we might as well have the picked n samples marked as (x1∗, x2∗, . . . , xn∗). n
Step 4: Calculate μ∗ = 1 n x∗, the average value of the small samples picked
n i=1i
at step 3.
Step 5: Repeat step 3, step 4 for K times (K can be a very large figure, usually
K=100, 000 times), and then each time there can be obtained with an average, taken
a s μ i∗ , i = 1 , 2 , . . . , K . Step6:Calculateμ′ = 1
K μ∗,hereμ∗,i = 1,2,…,K istheaverageofthe Ki=1i i
samples from the selected repetition at i times. Alternatively, the average of these K samples obtained at K times is averaged one more time. μ′ is the desired estimate of the unknown population.
2.2 Re-sampling Technique Based on the Bootstrap Method
According to the principle of bootstrap methods as indicated in 2.1, the following re-sampling technique was designed to obtain the expanded samples for research need.
Step 1–2. Same as step 1–2 for the principle of the bootstrap methods
Step 3. The distribution law was obtained from step 2 as Table 1.
In turn, extraction was repeated B times from the discrete distribution (usually
B ≫ n, appropriate when meeting the needs of the sample size), ending up with the obtained B samples that constitute the final expansion of the sample. As B was larger, it is generally believed that the expanded B samples contain the original n small samples. Based on the above analysis of USRC elements, according to the establishment principles of index system [6–8], the evaluation index system can be established, as shown in Table 1.
310 J. Yang et al.
3 Small Sample and the Simulation of Sample Model Recognition After Expansion
Regarding the recognition problem on the small sample model, the traditional BP neural network was placed to construct a classifier for the original small sample and the sample after the expansion using the bootstrap-based re-sampling technique, respectively, in order to verify its applicability.
3.1 Simulation Model Constructed for Small-Sample Model Recognition
The simulation model for small-sample model recognition was constructed as a Gaussian mixture model that generates “Swiss Roll”:
the covariance matrix
= 10 . 01
Based on the constructed Gaussian mixture model that features the Swiss Roll, 32 samples were randomly generated (8 samples each category, a total of 32 samples for four categories), with the original small sample model in Fig. 1.
(2) Sample model after the expansion using the bootstrap-based re-sampling tech- nique
With the help of the bootstrap-based re-sampling technique, the original 32 small samples were expanded to hit 300 samples, as shown in Fig. 2.
Comparing Figs.1 and 2, the expanded number of samples was increased visually.
(1) Original small sample model
Model Recognition Orientated at Small Sample Data 311
Fig. 1
Original small sample model
Sample data classification
4 3.5 3 2.5 2 1.5 1
0 50 100 150 200 250 300
Sample model after the expansion using the bootstrap-based re-sampling technique
15 14 13 12 11 10
9 8 7 6
5 6 7 8 9 10 11 12 13 14 15 z1
Fig. 2
3.2 Classifier Designed for Small-Sample Model Recognition
Targeting at the original small sample of the model, while taking care of the char- acteristics of Gaussian mixture model, a BP neural network structure was designed to perform model recognition experiments in line with the pertinent guidelines, as shown in Table 2.
z2
312
Table 2 Design of model classifier
J. Yang et al.
Guidelines
Relative error less than 5 %
Relative error less than 2 %
Small sample
Original sample
Small sample after expansion
BP neural network structure (Input layer nodes – hidden
layer nodes – output layer nodes)
2-5-1 2-5-1
Function of the output layer
Tangent function
Tangent function
3.3 Experimental Methods and Results on Small Sample Simulation
(1) Original small sample simulation:
The original model was given seven samples each category, a total of 32 small samples, where the first 7 for each category, a total of 28 samples were selected as the training samples. Of the remaining samples, one for each type, a total of four were selected as test samples to obtain the output of the network training and the relative error map, as shown in Fig. 3, with the relative test output and the error map shown in Fig. 4.
(2) Expanded small sample based on the bootstrap re-sampling technique:
There were expanded 300 samples using the bootstrap re-sampling technique, of which 270 samples were randomly selected as training samples, with the remaining 30 as test samples. The obtained output of the network training and the verified output are shown in Figs. 5 and 6.
Classification of the network training
0.35 0.3 3.5 0.25 0.2 0.15 0.1 0.05 1.5 0 -0.05 -0.1
0 5 10 15 20 25 30
Fig. 3 The network training and the relative error
Network training relative error
4.5 4
3 2.5 2
1 0.5
5 10 15 20 25 30
data
relative error
Model Recognition Orientated at Small Sample Data 313
4.5 4 3.5 3 2.5 2 1.5 1 0.5
1 1.5 2 2.5 3 3.5 4
0.1 0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08 -0.1
Classification of the network verified
Network verified relative error
Fig. 4 The relative test output and the error map Classification of the network training
data
Network training relative error
4.5 4 3.5 3 2.5 2 1.5 1 0.5
0 50 100 150 200 250 300
0.06 0.05 0.04 0.03 0.02 0.01
0 -0.01 -0.02 -0.03
0.03 0.02 0.01
0 -0.01 -0.02 -0.03
Fig. 5
The obtained output of the network training
Classification of the network verified
0 50 100 150 200 250 300
Network verified relative error
4.5 4 3.5 3 2.5 2 1.5 1 0.5
0 5 10 15 20 25 30
Fig. 6 The verified output of the network training
1 1.5 2 2.5 3 3.5 4
relative error
relative error
0 5 10 15 20 25 30
data
314 J. Yang et al.
4 Comparative Analysis
The above graphic is a visual display of the BP neural network, showing the effect of diagram of a model classifier, respectively, for the original small sample size and the expanded samples. To facilitate the data comparison, two types of BP neural networks were compared on the mode recognition accuracy in terms of the original small sample, providing the specific data with their predictive values and the errors (relative errors), as shown in Table 3.
As can be seen from Table 2, the correct rate of the expanded training and testing samples was greatly improved, indicating that after the expansion by adopting the Bootstrap re-sampling technology, the classifier was better than that for the original small sample.
The original small sample and the expanded sample in BP neural network showed their output values and error conditions as shown in Table 4 below.
As can be seen from Table 3, the expanded training sample and test samples had the output errors that were significantly reduced, indicating that after the expansion
Table 3 Classifier effect of the original small sample and expanded samples in BP neural network
Mean square error of training samples
Recognition rate of training samples
Number of training samples category
Correct number of training samples category
Correct rate of training samples category
Mean square error of test samples
Recognition rate of test samples
Number of test samples category
Correct number of test samples category
Correct rate of test samples category
Original small sample
0.006553567539420
75%
Cat. 1: 7 Samp.; Cat. 2: 7 Samp.; Cat. 3: 7 Samp.; Cat. 4: 7 Samp.
Cat. 1: 2 Samp.; Cat. 2: 5 Samp.; Cat. 3: 7 Samp.; Cat. 4: 7 Samp.
Cat. 1: 28.57 %; Cat. 2: 71.43 %; Cat. 3: 100 %; Cat. 4: 100 %
0.067451076534863
50%
Cat. 1: 1 Samp.; Cat. 2: 1 Samp.; Cat. 3: 1 Samp.; Cat. 4: 1 Samp.
Cat. 1: 1 Samp.; Cat. 2: 0 Samp.; Cat. 3: 0 Samp.; Cat. 4: 1 Samp.
Cat. 1: 100 %; Cat. 2: 0 %; Cat. 3: 0 %; Cat. 4: 100 %
Expanded samples using bootstrap
2.834126992471895e-04
95.19 %
Cat. 1: 67 Samp.; Cat. 2: 71 Samp.; Cat. 3: 73 Samp.; Cat. 4: 59 Samp.
Cat. 1: 60 Samp.; Cat. 2: 65 Samp.; Cat. 3: 73 Samp.; Cat. 4: 59 Samp.
Cat. 1: 89.55 %; Cat. 2: 91.55 % Cat. 3: 100 %; Cat. 4:100% 3.478027726935210e-04
93.33 %
Cat. 1: 9 Samp.; Cat. 2: 5 Samp.; Cat. 3: 8 Samp.; Cat. 4: 8 Samp.
Cat. 1: 9 Samp.; Cat. 2: 3 Samp.; Cat. 3: 8 Samp.; Cat. 4: 8 Samp.
Cat. 1: 100 %; Cat. 2: 60 %; Cat.3:100%;Cat.4:100%
Model Recognition Orientated at Small Sample Data 315
Table 4 The original small sample and the expanded sample in BP neural network showed their output values and error conditions
Training sample
0.91876 0.98089 0.98599 1.12220
0.77907 1.94477 1.92751 2.53836
Test 29
1 0.95533
sample
30 2
31 3
32 4
2.12924 2.50399 3.92869
The original small sample
The expanded sample Sequence True Output of value values samples
1 1 0.99221 2 2 1.98055 3 4 3.98072 4 1 1.02538
Sequence True of samples value
1 1 2 1 3 1 4 1
Output values
Absolute error (10ˆ-2)
8.12393 1.91084 1.40055
Absolute error (10ˆ-2)
12.21996 ……………………
25 4 4.03473
26 4 4.11668
27 4 4.00009
28 4 3.97737
3.47282 267 11.66820 268 0.00936 269 2.26296 270
4.46721 271 12.92396 272 49.60053 …
3 2.99233 0.76750 1 0.99221 0.77907
3 2.99445 0.55490
4 4.02254 2.25426
1 4.02254 2.25426
2 0.98545 1.45501
… … … 3 0.99449 0.55131
3.98918 1.08225
7.13062 299
300 4
by adopting the Bootstrap re-sampling technology, the classifier was better than that for the original small sample.
Through the above tabular data comparison, we can get the following conclusions:
(1) Original small sample expanded by using the bootstrap re-sampling method, worked as the expanded small sample;
(2) The bootstrap-based re-sampling method to expand the original sample may greatly improve the model classifier in the BP neural network on its recognition accuracy.
5 Conclusion
As illustrated in the above analysis and simulation, this chapter presented the bootstrap-based method of re-sampling techniques for the expansion of small sam- ples, which effectively extracted more data from small samples. The resulting sam- pling samples met the requirements of the traditional recognition method from the statistical theory for the number of samples. And the expanded samples, compared with the original small sample mode, showed higher recognition accuracy. This thus proved the correctness of bootstrap re-sampling method for the expansion of the original small sample.
316 J. Yang et al.
Acknowledgments The work is supported by Chongqing Educational Committee Science and Technology Research Project No.KJ091402, No. KJ111417, and the Natural Science Foundation of Chongqing University of Science & Technology No.CK2011Z01.
References
1. Xu,L.:NeuralNetworkControl.PublishingHouseofElectronicsIndustry,Beijing(2003)
2. Saridis,G.N.:Entropyformationofoptimalandadaptivecontrol.IEEETrans.Autom.Control
33(8), 713–721 (1988)
3. Lin, J.-H., Isik, C.: Maximum entropy adaptive control of chaotic systems. In: Procecdings of
IEEE ISIC/CIRA/ISAS joint conference, pp. 243–246. Gaithersburg (1998).
4. Xiaoqun,Y.:Intelligentcontrolprocessesbasedoninformationentropy.SouthChinaUniversity
of Technology (2004)
5. Masory,O.,Koren,Y.:Adaptivecontrolsystemfortuming.Ann.CIRP29(1),281–284(1980)
6. Hyvarinen,A.,Karhunen,J.,Oja,E.:IndependentComponentAnalysis.Wiley,NewYork(2001)
7. Shuang,C.:NeuralNetworkTheoryOrientedatMATLABtoolboxanditsApplications(Version
2). China University of Science and Technology Press (2003)
8. Shi, F., Wang, X., Yu, L., Li, Y.: Studies on 30 Cases of MATLAB Neural Network. Beijing
University of Aeronautics and Astronautics Press, Beijing (2010)