程序代写代做代考 scheme AI information theory discrete mathematics Functional Dependencies algorithm chain Bayesian Fortran decision tree database data mining Iowa State University

Iowa State University
Digital Repository @ Iowa State University
Retrospective Theses and Dissertations
2002
Optimization under uncertainty with application to data clustering
Jumi Kim
Iowa State University
Follow this and additional works at: http://lib.dr.iastate.edu/rtd Part of the Industrial Engineering Commons
Recommended Citation
Kim, Jumi, “Optimization under uncertainty with application to data clustering ” (2002). Retrospective Theses and Dissertations. Paper 389.
This Dissertation is brought to you for free and open access by Digital Repository @ Iowa State University. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Digital Repository @ Iowa State University. For more information, please contact hinefuku@iastate.edu.

INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertationcopiesareintypewriterface, whileothersmaybefromanytypeof computer printer.
The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps.
Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6″ x 9″ black and white photographic prints are available for any photographs or illustrations appearing inthiscopyforanadditionalcharge. ContactUMIdirectlytoorder.
ProQuest Information and Learning 300NorthZeebRoad.AnnArbor,Ml 48106-1346USA 800-521-0600

Optimization under uncertainty with application to data clustering
by
Jumi Kim
A dissertation submitted to the graduate faculty
in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY
Major: Industrial Engineering
Program of Study Committee: Sigurdur Ôlafsson, Major Professor Irvin R. Hentzel
Doug Gemmill
Timothy Van Voorhis Ananda Weerasinghe
Iowa State University
Ames, Iowa
2002
Copyright © Jumi Kim, 2002. All rights reserved.

UMI Number. 3051480
UMI
UMI Microform 3051480
Copyright 2002 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company 300 North Zeeb Road
P.O. Box 1346
Ann Arbor. Mi 48106-1346

ii
Graduate College Iowa State University
This is to certify that the doctoral dissertation of
Jumi Kim
has met the dissertation requirements of Iowa State University
Signature was redacted for privacy.
Major Professor
Signature was redacted for privacy.
Fok the M or Program

iii
DEDICA TION
I would like to dedicate this thesis to my parents, my sister and my husband without whose support I would not have been able to complete this work.

iv
TABLE OF CONTENTS
LIST OF TABLES vii LIST OF FIGURES x ABSTRACT xiv CHAPTER 1 Introduction
1.1 Continuous Input Variables 2
1.2 Random Search Method 4
1.3 Statistical Selection Method 5
CHAPTER 2 The Nested Partitions Method with Inheritance
2.1 Nested Partitions (NP) Method 7
2.2 NP with Inheritance 8
CHAPTER 3 Statistical Selection in NP
3.1 Statistical Selection Method 13
3.2 Independent Sampling 17 3.2.1 Two-Stage Sampling with Subset Selection 17 3.2.2 Numerical Evaluation 24
3.3 Correlated Sampling 29 3.3.1 Two-Stage Sampling with Nelson-Matejcik 29 3.3.2 Numerical Evaluation 32
3.4 Best Probability of Correct Selection 35
3.5 Conclusions 40
CHAPTER 4 NP/GA
4.1 Genetic Algorithm 41 4.1.1 Conventional Genetic Algorithm 42

V
4.1.2 Parent Selection 43
4.1.3 Crossover and Mutation 44 4.2 Algorithm 45 4.3 Numerical Evaluation 51
4.3.1 Probability of Finding Exact Solution 51
4.3.2 Probability of Finding Solution Using Indifference Zone 52
4.3.3 Average of Makespan 53
4.4 Conclusions 53
CHAPTER 5 Data Clustering
5.1 Knowledge Discovery in Databases (KDD) and Data Mining 55
5.2 Data Clustering 57
5.2.1 Hierarchical Clustering 58
5.2.2 Partitional Clustering 59
5.3 Clustering using NP 61
5.4 Numerical Evaluation 73
5.4.1 Nelson-Matejcik Sampling 75
5.4.2 Rinott’s Sampling 90
5.5 Conclusions 104
CHAPTER 6 Evaluation of the New Methodology
6.1 Amount of Inheritance 106
6.2 Statistical Sampling 113
6.3 Comparison with K-medoids methods 130
6.4 Conclusions 133
CHAPTER 7 Conclusions and Future Work
7.1 NP with Inheritance 134
7.2 NP with Statistical Selection Method and Random Search Method 135
7.3 Application to Data Clustering 137

7.3 Application to Data Clustering
7.4 Future Research
APPENDIX BIBLIOGRAPHY ACKNOWLEDGEMENTS
137 138
139 149 161
vi

Table 3.1 Table 3.2 Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6
Table 5.7
Table 5.8
Table 5.9
LIST OF TABLES
Summary of Statistical Methods 16 Increase of Sample Points for Different P* and Algorithms 33
Effect of Using Fraction of Instance Space for the Large Data Set Without Inheritance when the Nelson-Matejcik Sampling Method is Used 79 Effect of Using Fraction of Instance Space for the Large Data Set with Inheritance when the Nelson-Matejcik Sampling Method is Used 79 Similarity Results of f-test with 95% Confidence Interval of the Large Data Set when the Nelson-Matejcik Sampling Method is Used 80 Computation Time Results of r-test with 95% Confidence Interval for the
Large Data Set when the Nelson-Matejcik Sampling Method is Used 82 Effect of Using Fraction of Instance Space for the Small Data Set without Inheritance when the Nelson-Matejcik Sampling Method is Used 85 Effect of Using Fraction of Instance Space for the Small Data Set with Inheritance when the Nelson-Matejcik Sampling Method is Used 85 Similarity Results of /-test with 95% Confidence Interval for the Small Data Set when the Nelson-Matejcik Sampling Method is Used 88 Computation Time Results of r-test with 95% Confidence Interval for the Small Size Data set when the Nelson-Matejcik Sampling Method is Used . 88
Effect of Using Fraction of Instance Space for the Large Data Set without Inheritance when the Rinott’s Sampling Method is Used 92
vii

Table 5.10
Table 5.11
Table 5.12
Table 5.13
Table 5.14
Table 5.15
Table 5.16
Table 5.17 Table 6.1
Table 6.2
Table 6.3
Effect of Using Fraction of Distance Space for the Large Data Set with Inheritance when the Rinott’s Sampling Method is Used 92 Similarity Results of f-test with 95% Confidence Interval for the Large Data Set when the Rinott’s Sampling Method is Used 93 Computation Time Results of f-test with 95% Confidence Interval for the Large Data Set when the Rinott’s Sampling Method is Used 95 Effect of Using Fraction of Instance Space for the Small Data Set without Inheritance when the Rinott’s Sampling Method is Used 99 Effect of Using Fraction of Instance Space for the Small Data Set with Inheritance when the Rinott’s Sampling Method is Used 99 Similarity Results of f-test with 95% Confidence Interval for the Small Data Se when the Rinott’s Sampling Method is Used 100 Computation Time Results of f-test with 95% Confidence Interval for the Small Size Data Set when the Rinott’s Sampling Method is Used 102 Summary of all the Results (S: Similarity Value, C: Computation Time) … 105
Similarity Results of f-test with 95% Confidence Interval for the Large Data Set when the Nelson-Matejcik Sampling Method is Used 109 Computation Time Results of f-test with 95% Confidence Interval for the Large Data Set when the Nelson-Matejcik Sampling Method is Used 111 Similarity Results of f-test with 95% Confidence Interval for the Small Data Set when the Nelson-Matejcik Sampling Method is Used 114
viii

Table 6.4
Table 6.5
Table 6.6
Table 6.7
Table 6.8 Table 6.9 Table 6.10 Table 6.11
Computation Time Results of f-test with 95% Confidence Interval of Small Data when Nelson-Matejcik Sampling Method is Used 116 Similarity Results of f-test with 95% Confidence Interval for the Large Data Set 119 Similarity Results of f-test with 95% Confidence Interval for the Large Data Set 123 Computation Time Results of f-test with 95% Confidence Interval for the
Large Data Set 123 Comparison Results of Large Data Set when Inheritance is Not Used 131 Comparison Results of Large Data Set when Inheritance is Used 131 Comparison Results of Small Data Set when Inheritance is not Used 132 Comparison Results of Small Data Set when Inheritance is Used 132
ix

Figure 3.1
Figure3.2
Figure3.3
Figure 3.4
Figure 3.5
Figure 3.6 Figure 3.7 Figure 4.1 Figure 4.2 Figure 4.3 Figure 5.1 Figure 5.2
LIST OF FIGURES
Total Number of Sample Points of Each Depth for NP/Rinott (above) and NP/Subset/Rinott (below) 26 Total Number of Sample Points of Each Depth for NP/DD (above) and NP/Subset/DD (below) 27 Computation Effort in the Surrounding Region for NP/subset/Rinott and NP/Rinott (above) and NP/DD and NP/Subset/DD (below) 28 Average Performance of Final Solution for the Monte Carlo Problem (above) and the Queuing problem (below) 34 Performance for the Monte Carlo Problem Using NP/Subset/Rinott Algorithm
37 Performance for the Queuing Problem Using NP/NM Algorithm 38 Performance for the Monte Carlo problem Using NP/Subset/DD Algorithm 39 Probability of Finding Exact Optimal Solution 51 Probability of Finding Optimal Solution with Indifference zone 52 Average of MakeSpan 53 Simple Example for Clustering Using NP methodology 63
Similarity Value of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Large Data Set when the Nelson-Matejcik Sampling Method is Used 81
x

XI
Figure5.3 Computation Time of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Large Data Set when the Nelson-Matejcik Sampling Method is Used 83 Similarity Value of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Nelson-Matejcik Sampling Method is
Figure 5.4
Figure5.5
Figure 5.6
Used 87 Computation Time of Each algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Nelson-Matejcik Sampling Method is Used 89 Similarity Value of Each Algorithm with No inheritance (Left) and Inheritance Right) for the Large Data set when the Rinott’s Sampling Method is Used . 94
Figure5.7 Computation Time of Each Algorithm with No inheritance (Left) and Inheritance (Right) for the Large Data Set when the Rinott’s Sampling Method is Used 96
Figure 5.8 Similarity Value of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Rinott Sampling Method is Used ..101 Figure5.9 Computation Time of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Rinott’s Sampling Method
is Used 103 Figure 6.1 Similarity Value of Each Algorithm when the Numbers of Instances are 3 (Left), and 200 (Right) of the Large Data Set when the Nelson-Matejcik
Sampling Method is Used 110

Figure 6.2
Figure 6.3
Figure 6.4
Figure 6.5
Figure 6.6
Figure 6.7
Figure 6.8
Figure 6.9
Computation Time of Each Algorithm when the Numbers of Instances are 3 (Left), and 200 (Right) of the Large Data Set when the Nelson-Matejcik Sampling Method is Used 112 Similarity Value of Each Algorithm when the Numbers of Instances are 1 (Left), and 82 (Right) of the Small Data Set when the Nelson-Matejcik Sampling Method is Used 115 Computation Time of Each Algorithm when the Numbers of Instances are 1 (Left), and 82 (Right) of the Small Data Set when the Nelson-Matejcik Sampling Method is Used 117 Similarity Value of the Pure K-means and Combined K-means with the NP for
the Large Data Set 120 Similarity Value of Pure Genetic and Combined Genetic with the NP for the Large Data Set 121 Similarity Value of Genetic/K-means and Combined Genetic/K-means with the NP for the Large Data Set 122 Similarity Value of K-means when then Numbers of Instances are 30(Left) and
200(right) for the Large Data Set 124 Similarity Value of Genetic when the Numbers of Instances are 30(Left) and 200(Right) for the Large Data Set 125
xii
Figure6.10 Similarity Value of Genetic/K-means when the Numbers of Instances are 30(Left) and 200(Right) for the Large Data Set 126

Figure 6.11
Figure 6.12
Figure 6.13
Computation Time of K-means when the Numbers of Instances are 30(Left) and 200(Right) for the Large Data Set 127 Computation Time of Genetic when the Numbers of Instances are 30(Left) and 200(Right) for the Large Data Set 128 Computation Time of Genetic/K-means when the Numbers of Instances are 30(Left) and 200(Right) for the Large Data Set 129
xiii

xiv
ABSTRACT
A new simulation optimization technique that extends the pure nested partition (NP) algorithm is presented in this thesis. This method is called the nested partition with inheritance. Furthermore, a statistical selection method and a random search method are introduced to overcome certain shortcomings of the pure NP (Nested Partitions) algorithm. Finally, the suggested algorithms are applied to a data mining problems, namely data clustering.
The basic idea of a NP algorithm is very simple. At each iteration, the most promising region is partitioned and the performance of the partitioned region is evaluated using sampling. Based on the performance evaluation, the most promising region is chosen for the next iteration. These procedures are repeated until it satisfies the termination condition.
Even though the pure NP method guarantees the convergence to the optimal solution, it has two apparent problems. The first problem is related to the two sources of errors in the estimate of each region during the sampling phase of the NP algorithm: the sampling error due to the use of a limited number of sample points from the region and the estimation error due to the use of optimization under uncertainty. The other problem is that at each iteration, there is no guarantee of whether the correct move is made or not. To handle these shortcomings, two extensions to the pure NP are suggested. To rigorously determine the
required sample effort, some statistical selection methods are implemented, which include the Nelson Matejcik procedure, the Rinott procedure, and the Dudewicz and Dalai procedure, as

XV
well as a subset procedure. In addition, Genetic Algorithms (GAs) are used to speed convergence and to overcome the difficulty in the backtracking stage of the NP algorithm. The resulting algorithms are evaluated through problems with two types of noisy performance: a Monte Carlo simulation problem and a discrete event simulation, queuing
problem.
As an application of the new methodology, this work also suggests the methods to be
applied to a data clustering problem. This is a very hard problem within data mining, with two of the main difficulties being lack of scalability with respect to amount of data and problems with high dimensionality. The new algorithms are found to be effective for solving this problem. Random sampling enhances scalability and the iterative partitioning addresses the dimensionality.

1
Chapter 1 Introduction
Optimization under uncertainty has been found to be useful in areas such as designing manufacturing systems, evaluating the requirements of computer systems, determining policies in inventory systems, designing and operating transportation facilities, evaluating designs for service organizations and analyzing financial systems. Complex and large systems cannot be solved by simple analytical or mathematical methods. For this reason, using simulation is often necessary. Moreover, optimization with uncertainty has become one of the most widely used tools in operations research and management science, especially
when large finite feasible region is given.
Evaluating the performance of every feasible point using optimization under
uncertainty is very time consuming. Even though many optimizations with uncertainty algorithms have been developed, there are some difficulties when applying these algorithms to real world problems. One of the reasons is that there is no guarantee for convergence to the optimal; therefore, new algorithms are needed to overcome this problem.
Every algorithm that is discussed in this thesis is based on the Nested Partitions (NP) method, an optimization with uncertainty method that is guaranteed to converge to an optimal solution. In this thesis, mainly three works contributions are made. First, a new method that is called the nested partitions method with inheritance is developed and evaluated that can be adopted as an optimization with uncertainty method. By introducing this new concept to the pure NP method, it will be shown that the new method improves efficiency of the pure NP
method. A detailed description of this new method is presented. To further improve the

2
efficiency of the pure NP method, combined algorithms with the statistical selection method and a random search method are suggested and validated through the numerical evaluation. Finally, data mining, which is recently becoming a hot issue, will be discussed. Specifically, data clustering will be focused upon. Data clustering has numerous applications, including loan payment prediction and customer credit policy analysis, classification and clustering of customers for target marketing, as well as detection of money laundering and other financial crimes. By applying the suggested algorithms to the data clustering problem it is shown that the NP methodology perfectly fits to solve the data clustering problem.
There are many methods for optimization under uncertainty. Deciding which method to use depends on the problem structure. For example, gradient estimation, stochastic approximation, and sample path optimization are applicable when the feasible input variables are continuous. On the other hand, the random search method and statistical method are applicable for discrete input variables. More detailed discussion of these methods is follows.
1.1 Continuous Input Variables
Until recently most techniques were developed for continuous input parameters. There are several common methods used for continuous input variables. Several of them are categorized as gradient-based methods. The classical stochastic optimization methods are
based on an iterative search in the direction of time of the gradient. This was originally suggested by Robbins-Monro and Kiefer-Wolfowitz in the 1950s (Robinson et al., 1951). The Robbins-Monro algorithm is simply a root finding procedure for functions whose values are not known but observed with noise. The Robbins-Monro algorithm estimates the gradient

3
directly; whereas, the Kiefer-Wolfowitz algorithm uses finite differences to the derivative. In both cases, the primary implementation problem determines the step size.
The most straightforward approach for the gradient estimation method is the finite differences method (Glynn, 1989; Andradottir, 1989). This method is simple to implement and generally applicable, but it has several difficulties when applied to practical problems. One of the big difficulties is that too much time is spent calculating the gradient. Forward differences and backward each requires n+1, 2n iterations for a n dimensional function. The other difficulty is the gradient estimates obtained using finite differences are generally biased. Trying to reduce bias leads to another difficulty: large variance in the estimations. There are
several studies describing these problems (Glynn, 1989; L’Ecuyer and Perron, 1994). For reducing the large variance problem, methods such as CRN (Common Random Number) have been suggested (Glasserman and Yao, 1992).
Unlike finite differences, Perturbation Analysis (P A) and Likelihood Ratios (LR) require only a single simulation run to obtain an estimate. Infinitesimal Perturbation Analysis (IPA) is the best know variant of PA. The basic idea of IPA is simply to take the estimator of the expected performance (L’Ecuyer, 1991). The basic idea of LR is that the gradient of the performance is expressed as an expectation with respect to the same distribution as the
performance function itself (Glynn, 1989). However, both methods have some limitations. They are not always applicable and more complicated to understand and implement than finite differences. A drawback to using all gradient-based search techniques is that they find
only local optima (Fu, 1994).

4
The effort to solve this problem by converting the original simulation into an approximate deterministic optimization problem is started by Rubinstein and Shapiro (1993). The sample path method involves converting the original simulation into an approximate deterministic optimization problem. This approach is also called the stochastic counterpart method that includes the random search method and statistical method.
1.2 Random Search Method
In this section, brief random search methods are reviewed when the feasible region is discrete. These methods also cannot usually guarantee a global optimal, and therefore they are often called heuristics methods. Three common random search methods are mentioned below.
Tabu Search was originally proposed by Glover (1977) for escaping local optimal by using a list of prohibited solutions known as the tabu list. The commonly used diversification method is re-starting from the best solution obtained so far. Another drawback of the tabu search is unless there is a long tabu list, it may reach a previously visited solution.
Simulated Annealing (SA), introduced by Kirkpatrick et al. (1983), is a random search method that is able to escape local optima using a probability function. Unlike the tabu search, SA does not evaluate the entire neighborhood in every iteration. Instead, it randomly chooses only one solution from the current neighborhood and evaluates its costs. That means SA tends to need more iterations to find the best solution than the tabu search method.
Another disadvantage is that it does not have memory, and hence it may re-visit a recent solution. There is a combination method of tabu and SA.

5
Genetic Algorithms (GAs) were originally developed by Holland (1975). This is the most widely known evolutionary method, which is both powerful and broadly applicable to stochastic optimization. It mimics the mechanisms of natural selection and natural genetics where stronger individuals are more likely to survive in a competing environment. Thereby, the strongest individual (having the best performance) survives. Commonly used operators include selection, reproduction, crossover, and mutation.
Another class of methods that can be used when the input parameters are discrete are the statistical selection methods.
1.3 Statistical Selection Method
Simulations are often performed to compare two or more system designs. The most popular statistical methods are ranking and selection (R&S) and multiple comparison procedures (MCPs), which are applicable when the input parameters are discrete and the number of designs to be compared is both discrete and small (say, 2 to 20). R&S are statistical methods developed to select the best system or a subset that contains the best system design from a set of competing alternatives (Goldman and Nelson, 1994). MCPs use pairwise comparison to derive relationships among all designs. In other words, R&S yields the best system while the MCPs yields information about the relationships among the
alternative solutions. Sanchez (1997) gives an overview of R&S with samples. Wen and Chen (1994) present single-stage sampling procedures for different MCPs. Goldsman and Nelson (1994, 1998) provide comprehensive reviews of R&S and MCPs. More detailed
discussion is found in Chapter 3.

6
Optimization method with uncertainty has been briefly surveyed. For the next chapter, the focus is on the NP method, a recently developed optimization with uncertainty method and the extended method.
The remainder of this thesis is organized as follows: in Chapter 2, the pure NP method is described and the concept of inheritance is introduced to improve efficiency of the NP method. As another way to improve the NP method, two different methods are employed: statistical selection methods and random search methods. These methods are discussed in Chapter 3 and Chapter 4. In Chapter 3, the combined NP methods with the statistical selection method are suggested and numerical evaluation is also shown. In Chapter 4, Genetic algorithms (GAs) are combined with the methods which are introduced in Chapter 3. Also, numerical results are shown. In Chapter 5 and Chapter 6, data clustering is introduced as a specific application of new algorithms which are introduced through the whole thesis and finally, in Chapter 7, the conclusion and future research is proposed.

7
Chapter 2
The Nested Partitions Method with Inheritance
In general, the optimization problem considered here minimizes an objective function /:©—»/?, over the feasible region 0; that is
min / (#) flee
where © is finite and f is unknown but can be estimated with f(d) for each. QG 0 .
2.1 Nested Partitions (NP) Method
The Nested Partitions (NP) method was originally developed by Shi and Ôlafsson (2000a). This method solves both deterministic and stochastic finite optimization problems. The basic idea of this algorithm is to shift the focus from points in the feasible region to a sequence of the feasible region. The Nested Partitions (NP) method is mainly composed of 4
procedures: partitioning, sampling, estimating promising index, and backtracking. In each iteration of the algorithm, it is assumed there is a region, i.e., a subset of 0 , that is
considered the most promising region. Then this most promising region is partitioned into M regions and the entire surrounding region is aggregated into one region. Therefore M+l disjoint subsets of the feasible region © are looked for at each iteration. Each of these M+l regions is sampled using some random sampling scheme, and for each region a promising index is calculated. These promising indices are then compared to determine which region is the most promising index in the next iteration. If one of the sub-regions is found to be best,

8
this sub-region becomes the most promising region. However, if the surrounding region is found to be the best, the algorithm backtracks and a larger region containing the current most promising region becomes the new most promising region. The new most promising region is then partitioned and sampled in a similar fashion. This process is repeated until the terminate criteria is satisfied. Generally, simulation is done when the maximum depth is reached. The singleton regions are called regions of maximum depth. Since the singleton regions cannot be
partitioned further, they are considered regions of maximum depth.
2.2 NP with Inheritance
In the pure NP method, independent sampling is performed in each iteration. In other words, after partitioning, given most promising region by throwing away good solutions, information that has been learned is lost. This allows for nice convergence properties as the algorithm generates a Markov chain. However, the basic idea of our new algorithm is to keep good solutions by inheriting these solutions to the next iteration. Every algorithm that will be considered in this thesis focuses on the use of sampling procedures in the pure NP method. The idea of inheritance also focuses on the sampling procedure, especially the part of how to start sampling each sub-region. If there is a minimum probability correct selection (P*), the sample point that is currently chosen satisfies P* and every iteration will satisfy P So, by
inheriting the current best solution to the next iteration, the probability of a correct selection continues to increase. Intuitively, better solutions are formed in the next iteration and the computation time is reduced.
The notation used by these algorithms is summarized below.

9
Z = { a £ 0 \a is a valid region given a fixed partitioning} Z0cl ={<7£©l 0, then use the sample points of INHk_x and fulfill the lack of Nv using random sampling and calculate the corresponding sample performance values L(0)
UejN), j=1,2 Maa)+1
M a { k ) + \ , define a promising index
index. For example, define /(CT; ) as the best
j= 1»2,…,M+1
y= 1,2, … +1
function,

11
AA
À e arg min /(<*;) y-1,2,... >Af^i
If more than one region is equally promising, the tie can be broken arbitrarily. If this index corresponds to a region that is a sub-region of 1 and j = 1,2,,…,M + 1. Let N =[£)„(<:)| denote the initial number of sample points, which is 13 assumed to be constant. In addition let 6 e LX(k) denote a point in this set and let 1(5) be a simulation estimate of the performance of this point. Then in the kA iteration, for every i, XAk)=Tmn L(0) (1) '' SeD^ik) is an estimate of the performance of the region 1, j = 1,2,…,M +1. The two-stage ranking and selection procedure first obtains n0 such system estimates, and then uses that information to determine the total number of Nj of system estimates needed from thef 1 system, which is, subregion
crj(k). This number is selected to be sufficiently large so that the correct subregion is selected with probability at least P \ subject to an indifference zone of e > 0 .
3.1 Statistical Selection Methods
Discrete-event stochastic simulation is often used to choose the best system among a set of proposed systems where best is defined by the maximum or minimum expected simulation output. Thus, considerable interests exist for Ranking and Selection (R&S) procedures. The fundamentals of R&S were first proposed by Bechhofer (1954). The original indifference zone R&S procedure proposed by Bechhofer (1954) is single-stage and assumes
unknown means and known, common variances for all systems. But indifference zone R&S procedures need not be single-stage. By defining the user-specified number of observations, they can extend to multi-stage procedures (sequential procedures) assuming common, known variances. Paulson (1964) and Bechhofer et al. (1968) present such methodologies. Koeing and Law (1985) extend the indifference zone approach for use as a screening procedure.

14
Unlike the articles discussed, Dudewicz and Teneja (1978) present a multivariate procedure which does not require reduction to a univariate model. If the indifference zone procedures use a least-favorable configuration (LFC) to allocate additional replications, the optimal computing budget allocation (OCBA) (Chen et al., 1996) and Bayesian decision-theoretic methods (e.g., Berger, 1988; Gupta and Miescke, 1996; Chick and Inoue, 1998; Chick and Inoue 1999) use an average case analysis to allocate additional replications (Inoue et al., 1999). All three procedures assume that simulation output is independent and normally distributed having unknown mean and variance and applicable to both two-stage and sequential procedures. Inoue and Chick (2000) show empirically that the two-stage procedure of Rinott (1978) performs competitively with sequential OCBA and Bayesian decision- theoretic methods when the number of systems under consideration is small (k < 5). For a large number of systems (k > 5), or when the difference in the mean output of the best system and other systems varies significantly, the Rinott procedure is less effective at
identifying the best system. Among two-stage procedures, the Bayesian decision-theoretic procedures have the best overall performance characteristics.
Recently, many articles have tried to unify the fields of R&S and MCPs. Multiple comparisons with the best (MCB) is one of the most widely used MCPs. To apply MCB in a discrete-event simulation, the simulation runs must be independently seeded and the simulation output must be normally distributed, or averaged so that the estimators used are somewhat normally distributed. There are four R&S-MCB procedures having normally distributed data, but do not require known or equal variance: Rinott’s Procedure (Procedure
R), Dudewicz and Dalal’s Procedure (Procedure DD), Clark and Yang’s Procedure

15
(Procedure CY), Nelson and Matejcik’s Procedure (Procedure NM) (Nelson Matejcik, 1995). Procedure R and Procedure DD are performed in the same manner with the only difference being in the calculation of the sample means. Both algorithms require independence among all observations. The total sample size depends on the sample variance of the systems. So the larger the sample variance, the more replications are required. Unlike these algorithms, Procedure CY and Procedure NM requires fewer total observations by employing the CRN. Clark and Yang (1986) use the Bonferroni inequality to account for the dependence induced by CRN. However, Nelson and Matejcik (1995) observed that the
benefit gained from using Procedure CY is diminished when the number of systems to be compared is large. To overcome this problem, they present Procedure NM. Procedure NM assumes that the unknown variance-covariance matrix exhibits a structure known as sphericity that implies the variances of all paired differences across systems are equal, even
though the marginal variances and covariances may be unequal. The difference between Procedure CY and NM is the calculation of sample variance. This sample variance affects the total number of sample size for second-stage sampling. Nelson and Matejcik (1995) reported that Procedure NMis superior to Procedure R, DD, and CY in terms of the total
observations required to obtain the desired confidence level. The only potential drawback with Procedure NM’s is that the assumption of sphericity may not be satisfied. The following table summarizes these characteristics.

Table 3.1: Summaryof Statistical Methods
( X tj : The output of the y* replication of system / )
Method
Procedure PB Bechhofer (1954)
Paulson (1964), Bechhofer et al.(1968)
Procedure PDD
Dudewicz and Dalai (1975) Procedure PDD Dudewicz and Zaino
(1976)
Rinott (1978)
Procedure R Nelson and Matejcik
(1995) Procedure DD Nelson and Matejcik (1995)
Procedure CY Nelson and Matejcik
(1995)
Procedure NM Nelson and Matejcik
(1995) Procedure OCBA
Chen etal. (1999) Procedure 0 —1(B)
ChickandInoue (2000)
CRN
Single/Two/Sequential Stage
Single (Indifference zone)
Sequential
Two
Single (MCP)
Two (Indifference zone)
Two
(MCB, Indifference zone)
Two
(MCB, Indifference zone)
Two
(MCB, Indifference zone)
Two
(MCB, Indifference zone)
Two/Sequential (Optimal computing budget allocation (OCBA)) Two/Sequential (Bayesiandecision-theoretic methods)
Major Assumption
i.i.d
X,y ~ N(/i,&-)
<72: common, known i.i.d i Xq - N(ji,2, the number of sub-regions M. Determine t
from the f-distribution and h for Rinott’s integral, t and h are constants which are determined by n 0 , the minimum probability P* of correct selection, and M (See the tables in Bechhofer et al., 1995).
Step 2. Partitioning
a—

Step 3.
Given the current most promising region f f ( k ) , partition o ( k ) into M sub-regions CT,(k)CTm (k), and aggregate the surrounding region ®\a(k) into one region
&M+1(^-)- First-Stage Sampling
Step 3-1. Step 3-2.
Step 3-3.
Let z=1.
Use uniform sampling to obtain a set D^ik) of N sampling points
from region y=1,2,… ,M +1.
Use discrete event simulation of the system to obtain a sample performance L(0) for every 6eDf> (fc) and estimate the performance
of the region as
XAk)= min L(6). ‘ »€0„U)
If i = nQ continue to Step 4. Otherwise, let / = i +1 and go back to Step 3-2.
Step 3-4.
Step 4. Estimating Mean and Variance of First-Stage Sampling Calculate first-stage sample means and variances
and
for y=1,2,… ,M +1.
20
5[x„(t)-x”‘(i)]2 2 __£=!_
sf =
1
“o~ I

Step 5. Filtering
Calculate the quantity
foralli* y.
Include the i* region in the selected subset I if
Yi < Tj+iWij-e)+ foralli*j. Step 6. Computing Total Sample Size 21 rs;+s) W;i =t nn If I contains only a single region, I = index so update G(k+ l)=a„(k)and go to Step 11. Otherwise, compute the total sample size for all y e I h2S- where e is the indifference zone and h is a constant determined by n0 and the minimum probability P* of correct selection. Step 7. Second-Stage Sampling Obtain Nj(k)—n0 more simulation estimates of the system performance for all y e / as in Step 3-1 through Step 3-4 above. Step 8. Estimating Mean of Second-Stage Sampling Let the overall sample mean be the promising index for all y e / , Nj=max n0+l, j (&)}, then this has the best promising 22 Afy(fc) Step 9. Estimating Promising Index Select the index of the region with the best promising index, AA jk €arg min/((k) JJ Step 10.Calculating Weighted Averages
Calculate weighted averages for all y e /
Xj(k)=WjlX°i\k) +Wj2X?{k).
and let these weighted averages be the promising index for all y e / ,
= %;(&)
*?<*> =
23

24
Step 11.Estimating Promising Index
See Step 9 in Algorithm NP/Subset/Rinott
Step 12. Checking Stopping Rule
See Step 10 in Algorithm NP/Subset/Rinott
3.2.2 Numerical Evaluation
T o numerically evaluate the performance of Algorithm NP/Subset/Rinott and NP/Subset/DD, consider a production system with a given number of M workstations configured in parallel and jobs that are to be processed by exactly one of the stations. The objective is to find the optimal resource allocation that minimizes the expected makespan having assumption that there are some R resources that can be assigned to perform the necessary work within each station, and those resources can be moved to other workstations upon completion of a job. This is a Monte Carlo simulation where the randomness derives from random processing times, subsequently referred in this thesis as the Monte Carlo
problem.
For in all parameters of these experiments. Let M = 2, R = 5, and the first stage
sample points in each region set to n0 = 20. Twenty replications are used for each experiment which were performed with P* e [0.55,0.95].
One of the primary benefits of two-stage sampling is that more computational effort is allocated in regions where it is needed. To insure that the two-stage approach indeed makes a substantial difference, the total number of sample points is used at each depth level. The results are shown in Figure 3.1. and Figure 3.2. These figures show that the computational

effort decreases as the depth increases, although there is a peak at depth two because this is the first depth where a surrounding region is considered. Intuitively the reason for this may be that, as the depth increases, the subrogions become more and more homogeneous leading to lower variance, and hence less effort is required to evaluate each region. The opposite is true for the surrounding region, but for Algorithms NP/Subset/Rinott and NP/Subset/DD, it may often be possible to filter this region out early, especially when substantial progress has been made and the quality of the subregion is high.
The potential benefit of two-stage methods without subset selection is illustrated in both figures. When the pure NP method is used, the number of sample points is constant. The first plot of Figure 3.1 shows that over 3,000 sample points are needed to guarantee 95% success probability at depth two. Contrast with this, what is needed using the two-stage sampling at depth six is only 2,000 samples. Thus, variable sampling reduces total computational effort by one-third. In addition, the subset selection creates an even greater savings, and for many settings of P” the effort that would be required for NP without two- stage sampling is three times that of which would be required for the NP/Subset/Rinott Algorithm.
A comparison of the two graphs in Figure 3.1 show that the NP/Subset/Rinott Algorithm requires fewer sample points. In particular, if the P* is low, the total number of sample points for the NP/Subset/Rinott Algorithm is less than half of that used by the NP/Rinott Algorithm; however, the relative difference between these algorithms is decreased by increasing P . Similar results are seen from the NP/Subset/DD Algorithm. In conclusion,

3500
26
at least for this problem the NP/Subset/Rinott Algorithm is less computationally expensive than the NP/Rinott Algorithm and these benefits are higher when P’ is set to a low value.
3500
Figure 3.1: Total Number of Sample Points of Each Depth for NP/Rinott (above) and NP/Subset/Rinott (below)
P* = 0.55 P* = 0.60 P* = 0.65 P- = 0.70 P* = 0.75 P* = 0.80 P* =0.85 P* = 0.90 P* = 0.95
•P”=0.55 -P”=0.60 •P*=0.65 •P*=0.70 •P*=0.75 -P*=0.80 -P”=0.85 -P”=0.90 •P*=0.95
!
| | |
j | I

3500
3500
27
Figure 3.2: Total Number of Sample Points of Each Depth for NP/DD (above) and NP/Subset/DD (below)
•P* = 0.55 -P* =0.60 •P ‘ = 0.65 -P* = 0.70
• P* = 0.75 -P* = 0.80 -P* =0.85j – P* = 0.90 j -P* =0.95!
-P*=0.55 ! -P*=0.60 i -P*=0.65 -P*=0.70 -P*=0.75 -P*=0.80 -P”=0.85 -P*=0.90 -P”=0.95

500
400
6/1
o 200
o
100
\
a. 300
Ô 200
– NP/Subset/Rinott
NP/Rinott
X
28
4 Depth
-NP/Subset/DD —«—NP/DD I
Figure 3.3: Computation effort in the Surrounding Region for NP/subset/Rinott and NP/Rinott (above) and NP/DD and NP/Subset/DD (below)

29
The main benefit of subset selection is the improvement of the computational efficiency by eliminating inferior systems in the first stage. This to be particularly effective as the depth increases such that the surrounding region becomes larger, thus usually increasing the variance, which in turn dictates more computational effort. However, if the search is identifies a very good region, a thorough search of the surrounding region may become
wasted effort and it would be beneficial to filter this region out early. Figure 3.3 shows the results for P* = 0.90. As the depth increases, the NP/Subset/Rinott Algorithm filters out the
surrounding region more and more frequently, resulting in lower average effort in the region. On the other hand, the NP/Rinott Algorithm uses more effort in the surrounding region, which is reasonable due to its high variance. Thus, the NP/Subset/Rinott Algorithm can realize substantial benefits over NP/Subset.
3.3 Correlated Sampling
3.3.1 Two Stage Sampling with Nelson-Matejcik
One assumption in the statistical selection procedure used by the NP/Subset/Rinott Algorithm is that each system is independent, which in simulation optimization implies the simulation samples for comparing the regions must also be independent. However, when comparing simulated systems, researchers prefer to use common random numbers (CRNs), thus making the systems independent. Hence it is important to consider statistical selection methods that allow for correlated systems. One such method is proposed by Nelson and Matejcik (1995) which will now be incorporated into the NP framework. Given a fixed first- stage sample size nQ, first-stage samples are randomly obtained from each region by using

30
the same stream of random numbers for each region. Using these samples, sample variance S of the difference of the sample means is determined, then use this to compute the final
sample size given indifference zone e.
N =max-Jna, (3)
Note that this requires computing the constant g which depends on the initial sample size n0 and the number of regions M that are compared (Nelson and Matejcik, 1995). Furthermore,
note that unlike Rinott’s two-stage sampling, the sample size for each system is identical in the second stage.
Algorithm NP/ NM
Step 1.
Initialization
Set k = 0 and l= 10, // = 10, JV= 6, AT= 18 and n0 =10. Twenty replications are used for
each experiment with P* e [0.55,0.95].
The performance of the three algorithms are compared along two dimensions: speed as measured by the total number of simulation runs, and quality as measured by the average performance of the final solution obtained. Table 3.2 shows the total number of sample points for the three algorithms and both problems. For both problems, notice that the NP/NM
Algorithm requires substantially fewer sample points than the NP/Subset/Rinott and NP/Subset/DD. With respect to solution quality, Figure 3.4 shows the performance results for both problems. For the Monte Carlo problem, the NP/NM Algorithm performs better: however, in the queuing problem, the NP/Subset/Rinott and NP/Subset/DD Algorithms show better performance. Moreover, there was little difference between NP/Subset/Rinott and NP/Subset/DD Algorithms. Thus, which algorithm performs better depends on the problem structure, but these results indicate that the NP/NM Algorithm is faster and thus better if the simulation budget is very limited.
Table 3.2: Increase of Sample Points for Different P’ and Algorithms
NP/NM NP/Subset/Rinott NP/Subset/DD
P’ =0.55 14000 59585 76638
P* =0.95 29785 249261 302855
P’ =0.55 6783 27623 27360
P’ =0.95 153865
201568 200858
Monte Carlo Problem
Queuing Problem

3J
3.2
<- 3.1 00 3 2.9 « e M NP/Subset/Rinott —#— NP/Subset/DD Nelson Malejcik 0.6 0.65 0.7 0.8 0.85 0.9 0.95 34 NP/Subset/Rinott —#— NP/Subset/DD —A— Nelson Matejcik ! 0.6 0.7 0.8 Figure 3.4: Average Performance of Final Solution for the Monte Carlo Problem (above) and the Queuing problem (below) 35 3.4 Best Probability of Correct Selection One of the key parameters that must be carefully chosen for the two-stage NP method is the probability of correct selection (/>*) in each iteration. If computation time is not an
issue, it can be set using some equation of Ôlafsson (1999) to set it according to the desired probability of terminating correctly. However, with limited computing budget it is of interest to empirically determine its best value.
This section does that for both the Monte Carlo problem and queuing problem. Thus for this section, the algorithm is terminated only after a fixed number of simulation evaluations have been conducted. For the Monte Carlo problem, the simulation was run for 6 different sets of simulation estimates: 20,000, 30,000, 40,000, 50,000, 65,000, or unlimited for the NP/NM Algorithm 9 different sets of simulation estimates: 75,000, 100,000, 125,000,
150,000, 175,000, 200,000, 225,000, 250,000, or unlimited for the NP/Subset/Rinott Algorithm, and 10 different sets of simulation estimates: 100,000, 125,000, 150,000. 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, or unlimited for the NP/Subset/DD Algorithm. For the queuing problem, the simulation was run for 6 different sets of simulation estimates: 50,000, 75,000, 100,000, 125,000, 150,000, or unlimited for the NP/NM algorithm, 9 different sets of simulation estimates: 75,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, or unlimited for the NP/Subset/Rinott algorithm, and 10 different sets of simulation estimates: 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, or unlimited for the NP/Subset/DD Algorithm. Figure 3.5 shows
the result of the NP/NM Algorithm. For the range of P*e (0.55,0.60,0.65,0.70), there is no significant difference of average performance and sample points even if the simulation

36
estimates are increased. Similar results are obtained for the NP/Subset/Rinott Algorithm, as illustrated by Figure 3.6. Those results show that it is possible to improve the performance found by increasing the P” value to the range of P’e [0.7,0.8], but with limited
computation budgets the performance degenerates very quickly for high P” values. Furthermore, relatively low P* values are recommended because the amount of computational effort increases exponentially with P*.

5–
*e
iu < 300000 200000 S 100000 50000 0.6 0.7 •—nom 0.8 0.9 1 0.6 0.7 Figure 3.5: Performance for the Monte Carlo problem using NP/Subset/Rinott Algorithm 37 0.8 0.9 I 30 - 15 - 0.5 200000 160000 a 120000 40000 - —M— 0.6 0.7 p« 0.8 0.9 1 0.6 0.7 0.8 0.9 1 38 Figure 3.6: Performance for the Queuing Problem using NP/NM Algorithm /' 4.6 4.2 "o • s . 3.4 - 0.5 300000 250000 200000-L- 150000 - - 0.6 0.7 0.8 0 .9 1 S 100000- 50000 • 0.6 0.7 0.8 0.9 1 39 Figure 3.7: Performance for the Monte Carlo problem using NP/Subset/DD Algorithm 3.5 Conclusions The above three methods take advantage of the statistical selection procedure, namely Rinott's with subset procedure, DD's with subset procedure, and Nelson Matejcik's procedure. The results show that the NP/NM Algorithm needs relatively less computational effort which is good for optimization with budget limits. When restricting computation time, the performance degenerates very quickly for high P* values. Also, relatively low P* values are advisable because the amount of computational effort increases exponentially with respect to P*. Using the subset selection procedure, inferior systems can be deleted in advance. As a results, the NP/Subset/Rinott Algorithm is preferred over the NP/Rinott Algorithm and the NP/Subset/DD Algorithm is preferred over the NP/DD Algorithm. It has been shown that which algorithm is best depends on the problem to solve. The NP/NM Algorithm is good for Monte Carlo problems. By using a two-stage sampling computation effort can be saved over pure NP. For the queuing problem, the NP/Subset/Rinott Algorithm and the NP/Subset/DD Algorithm give better results. And in both problems, the NP/Subset/Rinott Algorithm and the NP/Subset/DD Algorithm show nearly the same results. 41 Chapter 4 NP/GA In this chapter, one particular method that increases the effectiveness of the inheritance is presented. The potential drawbacks of the pure NP method are that it can be stuck in a local optimum by the uneven sampling error, and it has some difficulty in backtracking. In Chapter 3, the statistical selection method was used to address these problems. In this chapter, NP is combined with statistical selection, Genetic Algorithm (GA) and inheritance. Even though there is no guarantee of global convergence, GA is well known and effective for heuristic algorithms. Therefore, by applying a GA search to each sub-region, sample points that better represent the best performance in their region can be obtained. Based on these sample points, the next promising region can be more precisely determined. This is because GA is used for finding local optimums from whole region by finding the best solution of each region. As a result, a combined algorithm retains the benefits of both methods. GA is important in terms of the more work we put into obtaining good points, the more valuable it is to keep them using inheritance. In addition, by adding inheritance to this algorithm, the time to find optimums can be accelerated and good starting points for sampling in a GA procedure can be obtained. 4.1 Genetic Algorithm 42 Genetic Algorithm (GA), which was originally developed by Holland (1975), is a search algorithm based on the mechanisms of natural selection and genetics. This algorithm states that stronger individuals are most likely to survive in a competingenvironment. GA uses strings of characters (elements) to represent solutions called chromosomes. And each character in such a string is a gene representing a certain choice of gene. In biology, genes are passed to the next generations through reproduction. The reproduction process converges to optimal very much like an optimization paradigm. GA uses a positive value, known as a fitness value to reflect the degree of "goodness" of the chromosome which coincides with the value of the object for the problem (Man, 1999). Recently, the G A has received recognition as a tool for solving optimization problems in the industrial engineering area. GA is the most widely known evolutionary computation method which is both powerful and broadly applicable to stochastic optimization. The basic premise behind GA is that better solutions have better gene combinations (schema). Therefore, by carrying over this good schema during many generations, a population containing the optimal solutions can be obtained. Various hybrid genetic algorithms that outperformed pure genetic algorithms have been proposed (Gong et al., 1997; Shi et al., 1999). Further discussion can be found in Man (1999). 4.1.1 Conventional Genetic Algorithm GA starts by generating a test population. Based on a fitness evaluation of these population members, two parents are selected. After applying crossover and mutation using 43 these parents, a new generation is formed. The fitness test is then applied to this new generation and this evolutionary procedure is repeated as necessary. Step I. Initial populations Randomly generate an initial population Step 2. Fitness test Evaluate the fitness of population members Step 3. Selection Choose two parents from population based on the fitness test Step 4. Mutation and crossover Apply crossover and mutation to selected parents results to get the new generation Step 5. Fitness test Apply the fitness test against the new generation Step 6. Stopping criteria Check stopping criteria. If it satisfies stopping criteria, then stop. Otherwise, go to step 3 4.1.2 Parent Selection There are several methods for selecting parents. The most common types are the roulette wheel, (// + A), tournament, and ranking-ordered selection. • Roulette wheel selection (Holland, 1975) • • • Roulette wheel is the best known selection method. The sum of the fitness measures for all members is first calculated and then the fitness for each member is normalized against of this sum. All members are put on a roulette wheel so the larger the fitness, the larger the space on the wheel for the member, and the larger the probability of being chosen for mating. (// + A)-selection (Back, 1994) Many researchers prefer this method when dealing with combinatorial optimization because it prohibits selection of duplicate chromosomes from the population. Tournament selection (Goldberg et al., 1989) This method contains random and deterministic features. A set of chromosomes is first randomly chosen, then a chromosome is picked out of this set for production. Tournament size is the number of chromosomes in the set, which is generally 2. Rank-ordered selection A prescribed number of parents are taken for mating from the top of a rank-ordered list according to its fitness value. 44 4 .1 J Crossover and Mutation After parents have been chosen, they mate to produce two children. This is done by crossing (mixing) over the genes of one parent with the other parent's. Crossover is used to facilitate the mixing of good genes within a population. There are several methods for crossover. • Point crossover 45 A point is randomly chosen in the chromosomes of the parents then the left side of parent 1 is combined with the right side of the other parent and the right side of parent 1 is combined with the left side of the other parent. This makes another combinations. • Multiple-point crossover Multiple points are used to make new combinations. • Uniform crossover It's analogous to flipping a coin. For example, if head comes up, a gene of parent 1 is given to child 1. Mutation is the random change of once or more of the gene value and generally occurs with a very small probability. Mutation is used to introduce genetic variation. 4.2 Algorithm Like the development seen in Chapter 3, the main framework of Algorithm NP/GA is the NP method. In this method, GA procedures are incorporated into the NP method in the sampling step, and make the inheritance record after calculating the promising index. Algorithm NP/GA with Inheritance Step 1. Initialization Set k = 0 and <7(fc)=0. Step 2. Partitioning If d(Jk).
Step 3. Initial Population
If k = 0 and d((0;)= min l*{0f)i j=1>2,…, +1 ie (1.2._.N t )
Step 6. Backtracking
Calculate the index of the region with the best overall fitness (most promising region).

47
jk g arg min U<7,) ye (1.2 Afffcti+l I If more than one region is equally promising, the tie can be broken arbitrarily. If this index corresponds to a region that is a sub-region of *’ 1 1=1Œ/J
Using a sample of instances, the estimate
y = l , 2 , . . . , A / < r ( i ) + l . 63 L(z)=^^lx-Zy I, i=l,...,JV, y= +1 1=1 re/; isusedinsteadof J(z). Now there is an optimization problem for finding the cluster centers that minimizes the above objective function. At each depth, the NP method fixes the center of each cluster one at a time. Figure 5.1 shows simple example of clustering using NP methodology. This is nominal problem with two dimensions. The total number of cluster is 2. Number of Clusters = 2 Seeds Figure 5.1 Simple Example for Clustering using NP methodology ^Random Samples (uniform sampling, genetic algorithm) C,(l.l) CidO) 23.5 PromisingIndex C,(L2) C2(1J) 11 C,(l.l) C2(l.l) 21.2 > 1(0,(1)) = 17.05
64
Each cluster is represented by centers. We need to find out the optimal coordinate that minimizes similarity. To simplify the problem, it is assumed each dimension has only two values. That is, x, = {1,2}, i = 1,2. Therefore, there can be three subsets using partitioning at
the first iteration. In the first subset, Ist dimension of each cluster are set as (1,1). In the second subset, Ist dimension of each cluster are set as (1,2). In the third subset, Ist dimension of each cluster are set as (2,2).
First Iteration
Most promising region
After partitioning three subsets, two-stage sampling is applied. For example, for the first subset, every sample point of this subset has a fixed first dimension; the first cluster and the second cluster are fixed as 1. For the remaining dimension, centers can be randomly assigned from the values {1,2}. Using the sampling from each subset, a similarity value is calculated. Based on these values, the promising index is calculated for each subset. The most promising

Most promising region o(2)
s(o(2))
| Best Promising Index]
65
region is the first subset because it has the smallest promising index. The partitioning of the second iteration starts from the first subset. Because the 2nd dimension can take 2 different values, three different subsets can be obtained like in the first iteration. Also, the 2nd iteration is the maximum depth because there are two dimensions in this problem. From the 2nd iteration, there is one more subset which is called the surrounding region. After sampling from all subsets, the most promising index is found in the second region, having the Ist cluster’s coordinate (1,1) and the 2nd cluster’s coordinate (1,2). These coordinates are optimal that minimize the similarity of this problem.
Second Iteration
(Ci(l,«) C;(2,.)), (Ci(2,e) €2(2,’))
^ Random Samples
Ci(2,l) Cz(2,2) 7 Ci(l,2) €2(2,2) 11 €,(1,1) C2(2,l) 14 Ci(l,l) €2(2,2) 8
The following algorithms describe the combined NP method with the Nelson-Matejcik’s procedure and the K-means algorithm.

66
Algorithm NP/NM/K-means with Inheritance
Step 1. Initialization
Setfc=0 andtr(Jk)=0.Specifythevalueof z°, j=1,2,…,Ma{k)
Specify the constants e , a , n k and n0. Let g = TTM ik_X)ilh_l)Q5, an equicoordinate critical point of the equicorrelated multivariate central t -distribution; the constant can
be found in Hochberg and Tamhane (1987), Appendix 3, Table 4; Bechhofer et al.
(1995); or by using the FORTRAN program AS251 of Dunnet (1989). Step 2. Partitioning
If d(o{k))*d’, that is, d(&(k)) for each cluster of
each subregion and back to Step 3-2-2.
Step3-3. If t=n0,continuetoStep4.Otherwise,let u=u+1 andgoback toStep
3-2-2. Stage II Sampling
Step 4. Estimating Mean and Variance of First-Stage Sampling
Compute the approximate sample variance for the difference of the sample means
(&—l)(*o —1)
WhereX.^XJk,X,=X2,X„/«„ andX
Step 5. Computing the Total Sample Size for Second-Stage Sampling Computethetotalsamplesizeforall j

Computethetotalsamplesize N(k)=max
for y’=l,2 Mffa}+1 Step 6. Second-Stage Sampling
” q ,
‘(8s)2] yeJ
68
Obtain N i k ) – ^ more sample estimates of the system performance for all y as in Step 3 above.
Step 7. Estimating the Mean of Second-Stage Sampling
Let the overall sample mean be the promising index for all ye / ,
/(»,(*» = x ,(t) = 2 = L _ |d ^
Njik)
Step 8. Calculating the Promising Index
Calculate the index of the region with the smallest squared error criterion function (most promising region);
AA
jk e arg min/(oy ) forallye /.
If more than one region is equally promising, the tie can be broken arbitrarily. If this index corresponds to a region that is a subregion of -M 2a» //(200)-/i(350) //(200)-a(699) A (3)-//(30)
//(200)-//(350) //(200)-A (699) /z(3)-//(30)
^(200)-/z(350) /Z(200)-A <699) Z(n) 37.7 1 3 6 3 57.1 5.8 146.4 11.0 12.0 244.5 -7.2 : l6$-6 -5.0 252.4 :: -V ' 111» 17.3 65.1 9 J -1 7 j -42.9 361.2 3X7 30.1 Var(Z(n)) 3256.32 4353.61 5899.13 5412.31 1534.71 1318.71 647.78 3753.09 3714.45 3626.0» 2872.49 2162.61 99£72: 1030.99 819.11 3477.89 4$7M 3 2324.49 2997.95 2769.88 1595.84 2287.95 Confidence Interval [-76.9. 1523] (3.7,268.8] [-97.2.211.3] [-142.0.153.51 [67.7. 225.11 [-61.9.83.9] [573.17M 1 [-39.1.63.11 [121.4.367.51 [-129.6. 115.2] [423,28431 [-112.6. ÎO2.6] [158.9.345.81 (505,1773] [-47.1.81.8] [7.5. 122.51 [-ÎO9.2. 127.7) [1243,3963] [-114.3.79.31 [-152.9.67.0] [255.4.466.9] [95.4.28731 [-473. 112.9] [-66.0.126.1] NP/NM/Genetic NP/NM/Genetic/K- means 80 Table 5.3: Similarity Results of f-test with 95% Confidence Interval of the Large Data Set when the Nelson-Matejcik Sampling Method is Used ii: F 3 i •6 1 g 3000 43* » 4M NP/NM/K-means algorithm -we • low so% • ion ; Replications Replications Repikaaons -s* m ioo» -0 » 9 4Sk -30%• 10%| 81 NP/NM/Genetic algorithm -#% # 100% »es% •45% m- w%• NP/NM/Genetic/K-means algorithm Figure 5.2: Similarity Value of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Large Data Set when the Nelson-Matejcik Sampling Method is Used B3000 Replications NP/NM/Genetic NP/NM/Genetic/K- means No Yes No Yes No Yes f(350)-r<200) r(350)-r(30) r(699)-f(350) r(350) - r(200) r(200)-r(30) « 3 )-t(3 0 ) r(699) - r(350) r(350)- r(200) t(20Q )-t(30) r(200)- r(3) r(699)- r(350) r(200)-r(30) r(3)- r<30) r(350)-r(200) r(350)-r(30) r(350)-r(3) r(699)-r(350) r(200)- r(30) r(200)- r(3) h: 82 Table 5.4: Computation Time Results of f-test with 95% Confidence*Interval for the Large Data Set when the Nelson-Matejcik Sampling Method is Used Algorithm NP/NM/K-means Inheritance S r(699)- r(350) Zin) 293112.0 57871.5 4828.8 275852J 55397.8 4595.6 3806;!. 252352.8 32007.6 4028Ah 4411.4 115716.1 SIM 13145 3620.3 4846.1 3949.1 21793.1 69605.9 Var(Z(n)) 44656110.50 1821877.94 4287784.60 7916703J7 469235.84 30965.96 .446L07 557756722.50 95879826.60 . 104906984.50 73052166.33 42115329.06 1410830&46 4796274.00 3193699.00 118284295.70 53839269.07 57173370.00 31479374.57 36691463.38 Confidence Interval [279685.8.306536J] [55159.8.60583.2] [3514.8.6142.8] i , [3I5TA704Z0t [270199.7. 281505.0] [54021.6.56773.9] [4242-l.4949.il [3771A.404021 [204906.5. 299799.1] [12335.8.51679.3] [•29605.8.1154821 [-12759.6.21582.4] [102678.4. 128753.71 [85295,16675.0] [-3085.3.5714.2] [30.0.7210.51 [124430.5. 168129.7] [-9894.9. 19587.2] [-11241.5. 19139.71 [10521J . 33064.8] [57436.7. 81775.2] [3758.0,9654.81 [-2043.5.3188.7] [-120X7.3983.91 >
7
HUBiiiSiffe 572.6 1695801.00
1390.6 1666363.74

4S* m 43» 3M 0 M* # 100»
rAltm,
-45*
3M K 40* • 100*
•jv–a|
^To.i ^
l
g ^3flO O OÛ.
. . •
50
•A .T»vv».„a,
« U U S 2 89»
1 i

Replications
4UISZI U M 41 Replications
4 u :s n :« n i* *i u
-M» m MB»
05» 0 45* 50» # lO»|
-50» • 100»
•05» # 4* 39» •» 50» • tOfr|
NP/NM/Genetic algorithm
|—^^oa» i 4a» a» —m—m» a -loo»| |—^»os» # 45»
Replications
a» —»—w» 0 -100» (
87
NP/NM/K-means algorithm
Replications
NP/NM/Genetic/K-means algorithm
Figure 5.4: Similarity Value of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Nelson-Matejcik Sampling Method is Used
Replications

NP/NM/Genetic
r(l)-r(1 2 ) r(286) – r(143) r(286) – r(82) r(2 8 6 )-r(!2 ) r(2 8 6 )-r(l) r(286)-r(143) r(286)-r(82)
r(l)-r(1 2 ) r(286)-r(143) r(286) – rl82) r(286)-r(!2) r(2 8 6 )-r(l) r(286) —r(143) r(286)-r(82) r(286)-r(12) r(2 8 6 )-r(l)
7477.0 -6691.6 11285.2 -2450.0 13556.4
676.5 215.0
WBi 53943
1986.9
2564.6 11783.6 5123.0 -1969-3 -562.7 1272.1 1454.4
NP/NM/Genetic/K- mcans
No
Yes
No
Y es
No
Yes
r(l4 3 )-r(8 2 ) itl4 3 )-r(l2 ) r(l) – r(12)
l l S S i r(143)- r(82)
724.2 -435.1 6510.2
n n s w i ftms
1120.8 r(t2)-r(82) 2546.1
88
Table 5.8: Computation Time Results of r-test with 95% Confidence Interval for the Small Size Data set when the Nelson-Matejcik Sampling Method is Used
A lgorithm
NP/NM/K-means
Inheritance S
Z(n)
Var(Z(n))
18883956.66 84322438 2305501.98
wmm4 . 651458.48
650712.63 1138584.00 214701206.90 149730595.50 189676967.00 160702337.20 883792.14 1197948.19
3386494.00 42365775.27 38017073.37 4147200169 27790075.30
1209276.43 884566.63 47519935
1972903.40
Confidence Interval [5002S.6,63677.9] [-2033.2.3481.7] [-2279.9. 1409.6] [3459.8.9560.7] [39713.2.43744.4] [-500.6. 2742.4] [9253.4166.7] [5333.3.9260.7] [-36128.8. 22745.6] [-13297.8.35868.2] [-30118.6.25218.6] [-11911.4.39024.1] [-1212.1.2565.1] [-1983.7. 2413.9]
L1414S.660641
[1697.4.90913] [-11089.4. 15063.3] [-9822.4. 14951.7] [-119173. 13020.4] [-5467.6. 15713.7] [-4178.6.239.81 [-2452.2. 1326.7] [-112.7.2657.0] [-1367.4.4276.2]

•OS»
» 43»
-5» # 10»
-05» • 45»
-50»

tOO» I
•OS»
I 45»

«
NP/NM/K-means algorithm
50» # to»
-50»
m t»» I
•05»
# 45»
NP/NM/Genetic algorithm
W» # to»
-50»
g IX» I
Replications
Replications
NP/NM/Genetic/K-means algorithm
89
Figure 5.5: Computation Time of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Nelson-Matejcik Sampling Method is Used

5.4.2 Rlnott’s Sampling
Figure 5.6, Figure 5.7, Table 5.9, Table 5.10, Table 5.11, and Table 5.12 show the
results of the large data set when Rinott’s sampling method is used. As in the Nelson- Matejcik’s sampling method, fi(i) is defined as the expected similarity value for each system and r(i) is defined as the expected computation time when the number of instances are i s {3,30,200,350,699}.
In the NP/NSR/K-means algorithm with no inheritance, there is no difference in the similarity values. Runs using between 30 ~ 200 instances required the least computation time. When using inheritance, the best similarity value is found to be only 30 instances with
no difference above 30 instances (see the confidence interval of /z(30) —//(200) in Table 5.12). Runs using 200 instances required the least computation time (see the confidence intervalof f(30)- f(200) inTable5.12).
The NP/NSR/Genetic algorithm results show that to get the minimum similarity value, at least 350 instances are required but the difference in similarity value cannot be found even if more than 350 instances are used (see the confidence interval of
//(350)- 4 /(699) in Table 5.11) when inheritance is not used. When using inheritance, fewer sample instances, 200, are required to get a minimum similarity value than without
inheritance and more than 200 instances give no difference in similarity value (see the confidence interval of /z(200)-//(350) iiuTable 5.11). Up to 200 instances there are no
difference in computation time (see the confidence interval of r(200) – r(30) in Table 5.12), but above 200 instances, computation time increased.

91
The NP/NSR/Genetic/K-means algorithm results show that to get the minimum similarity value, at least 200 instances are required and there is no difference in similarity value even if more than 200 instances are used (see the confidence interval of
/v(200)- /v(350) in Table 5.11). Finally, there is no difference in computation time up to
350 instances (see the confidence interval of r(699) – r(350) in Table 5.12), however more
than 350 instances make computation time increase.
Table 5.9 and Table 5.10 show the mean and variance of accuracy, computation speed
and the amount of backtrackings of the large data set when Rinott’s sampling method is used. When only half of the instances is used, the computation time and variance are decreased by a quarter without solution quality. Also, when using inheritance, relatively better accuracy, better speed, and fewer amount of backtrackings are needed than without inheritance. This is noticeable for the NP/NM/K-means algorithm.
Figure 5.8, Figure 5.9, Table 5.15, and Table 5.16 show the results of a small data set when the Rinott’s sampling method is used. As in the Nelson-Matejcik’s sampling method, //(f) is defined as the expected similarity value of each system and r(t’) is defined as the
expected computation time when the number of instances are i € {1, 12,82,143,286}.
The NP/NSR/K-means algorithm results show almost same results as in the Nelson- Matejcik’s sampling method. The best similarity value is obtained when 143 instances are
used and there is no difference in similarity value even if and more than 143 instances are
used (see the confidence interval of //(143)- /i(286) in Table 5.15) when inheritance is not
used. Computation time is minimized between 12-143 instances. When using inheritance, fewer sample instances, 82 instances, are required to get a minimum similarity value than

Similarity Value
Computation Time
92
Table 5.9: Effect of Using Fraction of Instance Space for the Large Data Set without Inheritance when the Rinott’s Sampling Method is Used
Algorithm NP/NSR/K-means
NP/NSR/Geneuc
NP/NSR/Gcnctic/K- mcans
Fraction
100% 4327.7 44
50% 4301.7 50 28% 4295.7 48 4.5% 4376.8 55 03% 4354.9 42 100% 4196.6 49 50% 4218.1 44 28% 4434.9 44 4.5% 4432.4 42 0.5% 4667.5 31 100% 4403.3 40 50% 4481.8 34 28% 4391J 40 43% 4559.1 42 0.5% 4661.6 53
Backtracking 2590 0.14 0.057
Table 5.10: Effect of Using Fraction of Instance Space for the Large Data Set with Inheritance when the Rinott’s Sampling Method is Used
Algorithm NP/NSR/K-means
NP/NSR/Gcnctic
NP/NSR/Genetic/K- means
Fraction Similarity Value 100% 33393 24 50% 33633 27 28% 3409.9 25 4.5% 3405.7 21 0-5% 3493.2 28 100% 3405.6 17 50% 3418.2 20 28% 3459.9 21 4.5% 3542.9 20 0.5% 3808.0 29 100% 3834.8 37 50% 3879.7 34 28% 3836.0 35 4_5% 3982.4 33 03% 44253 44
Computation Time
Backtracking
114530 42178 39490 39791 42316
522436 179204 114039 122786 125134 108895
79793 72567 73423 71228
547 0.10 659 0.26 647 0.18 213 0.02
33318 1.06 10221 0.94 4661 0.68 6256 10.2 5729 1.12 70004 1.06 4579 0.98 3298 0.64 3187 0.66 2663 0.56
0.043 0.069 0.062 0.020 0.193 0.190 0.147 0.193 0.017 0.245 0.231 0.171 0.180 0.131
190673 56990 37091 37890 41712
212895 114857 60296 58701 61792 51932 27948 27330 27199 25419
1439 0.00 273 0.00 36 0.00 41 0.00 52 0.00 10154 0.02 2950 0.02 1776 0.04 1183 0.04 1053 0.00 2159 0.14 938 0.16 999 0.10 1297 036 955 0.26
0.000 0.000 0.000 0.000 0.000 0.020 0.020 0.028 0.028 0.000 0.057 0.059 0.051 0.106 0.075

Algorithm
NP/NSR/K-means
Inheritance
No
Yes
No
Yes
No
Yes
S
M3) – a <30) M30)—M200) / i (30)-a (350) //(30)-a (699) fi(30)-x«350) M30)-M699) //(3 )-//(3 0 ) //(30)-a (200) //(350)-/<(699) /i(200)-//(350) / z (200)-/z (699) a z (3)-/i (30) a (200)-/<(350) fl(200)-/<(699) A (3)-A (30) a (200) - a (350) a (200) - a (699) Z(n) Var(Z(n)) -21.9 420157 81.1 4658.49 75.1 443755 49.1 5487.75 Confidence Interval [-1511.108.3] [-55.9. 218.2] [-58.6. 209.0] [-99.7. 197.9] 02*16241 [-719.64.6] [-26.2. 110.7] [-3.4. 136.2] [124.1.346.1] [-20.4. 215.4] [1018,325.7] [-92.3. 135.3] [1918.337.2] 1143,139J] [-17.1. 100.51 [-1.8. 110.3] [-38.0.243.0] [29.1306.4] [-1610.61.1] [-118.1.94.1] [333-5.5512] C433.249.4] [-144.4.57.1] [-107.9. 110.4] NP/NSR/Genctic IHH! -4.1 42.2 66.4 235.1 97.5 r : ' W :.4 21.4 265.0 41.7 54.2 1014 -50.4 -110 4419 a lii -43.6 1 2 1173.94 1163.15 1208.00 3053.40 3444.05 307631 3210.72 129198 NP/NSR/Genetic/K- means im & Ê ? : 857.40 779.60 4893.82 475933 3084.42 2792.92 2963.47 2516.07 2955.02 93 Table 5.11: Similarity Results of r-test with 95% Confidence Interval for the Large Data Set when the Rinott's Sampling Method is Used 1 i 1 -S» • MO» "X F* «UUC.MUMC.tt Replications -«%mia* @3» » "15 NP/NSR/Genetic algorithm too»I 1—^^—03» # «â » —•»—a—iac» Replications a» ——vk Replications Replications Replications 94 NP/NSR/K-means algorithm NP/NSR/Genetic/K-means algorithm Figure 5.6: Similarity Value of Each Algorithm with No inheritance (Left) and Inheritance Right) for the Large Data set when the Rinott's Sampling Method is Used Table 5.12: Computation Time Results of Mest with Data Set when the Rinott's Sampling Method is Used 95% Confidence Interval for the Large Algorithm NP/NSR/K-means Inheritance No Yes No Y es No Y es S r(699)-r(350) r(350)-r(200) # 2 0 0 ) - * 3 0 ) «350)-1X200) r(3) - r(30) r(699)-r(350) r(350)-r(200) r(200) - r(30) r(200) - r(3) r(350)- r(200) «350)-«30) 6370.5 «350)-«3) 8565.5 Var(Z(n)) 7185759.00 796050.37 1049028.03 NP/NSR/Genetic r(2 0 0 )-r(3 ) r(699)-r(350) NP/NSR/Genetic/K- mcans •mm•irc'L"'•.'-.-r'VtiOSp:; «350)-«200) 1068.8 1583978.80 «350)-«30) 1199.7 2481275.22 « 3 )-« 3 5 0 ) 2529.1 1605678.30 Z(n) 72351.0 2688.9 -30.1 1SSSI 133682.8 19899.0 3821.56.0 343232.0 65165.0 -12077.1 200948.2 :####. 52955:7 2289860J 0 74702.71 : = j3 M S . 9 2 5468.40 1383888532.00 138973863.00 67415142.78 50620185.00 16835724.00 12366315.40 3194934.40 3635724.48 27939739.80 28311895J8 31626919.60 Confidence Interval [66965.6.77736.4] [896.4.4481J ] [-2359.3. 1755.9] [1394.1.4258.4] [130642.7. 136722.8] [19349.9.20448.1] [ 6 8 0 . 0 , 9 1 7 4 1 [3672.9. 3970.1] [268495.417968.0] [41482.1.88849.1] [•25242.4,7748.01 [-26370.7. 2216.4] [174878.8. 227017.8] [45890.960020.5] [-955.7.6226.1] [-3975.5.3685.8] [10928.9.47274.1] [3392.6. 17845.7] [-4319.1. 17060.2] [-2732.6. 19863.7] [18958.2,28107J] [-1459.6.3597.2] [-1964.8.4364J] [-165.5074.8] 95 2635.1 -144.8 72265 : : 1 S s •4SI 0 43» -50» • 100» 43» 0 43» -w» • tool t 6 U 14 » 2< 11 M 41 44 Replications « U 14 a 24 U M 41 Replications NP/NSR/K-means algorithm «3» 0 43» m —w—M» # 100» •03» 0 43» 2M -U» M tOC» 96 NP/NSR/Genetic algorithm -M» # tOO»I 43» 0 43» m —«- M» • 100»I i »uKa»n.3«4i«c Replications NP/NSR/Genetic/K-means algorithm Figure 5.7: Computation Time of Each Algorithm with No inheritance (Left) and Inheritance (Right) for the Large Data Set when the Rinott's Sampling Method is Used Replications «i 97-98 without inheritance but there is no need to use more than 82 instances (see the confidence interval of /i(82) - //(143) in Table 5.15). Computation time is minimized between 82-143 instances. The NP/NSR/Genetic algorithm results show that 143 instances give the best similarity value but more than 143 instances make no difference in similarity value (see the confidence interval of //(143)- //(286) in Table 5.15) when inheritance is not used. When inheritance is used, 82 instances give the best similarity value and there is no difference in similarity value more than 82 instances are used (see the confidence interval of //(82)- //(143) in Table 5.15). Computation time is minimized with 82 instances when inheritance is not used (see the confidence interval of r(286)- r(82) in Table 5.16). However, when using inheritance, fewer sample instances, 12 instances, are required to get a minimum similarity value. The NP/NSR/Genetic/K-means algorithm results show that whether the inheritance is used or not, 143 instances require the smallest similarity value but the difference in similarity cannot be found more than 143 instances are used (see the confidence interval of /i(143) - /z(286) in Table 5.15). There is no difference in computation time. Table 5.13 and Table 5.14 show that the mean and the variance of accuracy, computation speed and the amount of backtracking of the small data set when Rinott's sampling method is used. The results are similar with large data set. Algorithm NP/NSR/K-means NP/NSR/Genetic NP/NSR/Genetic/K- means Fraction 100% 50% 28% 43% 0.5% 100% 50% 28% 4.5% 0.5% 100% 50% 28% 4.5% 0.5% Computation Time 27134 884 24862 478 24472 468 25678 647 33202 1537 0.32 121930 9268 2.24 23470 6429 0.00 105136 8352 1.74 179993 13807 3.86 147007 12118 3.32 67901 4425 2.00 64803 4269 2.20 77988 7366 2.30 68307 4502 2.06 737002 6140 2.44 Similarity Value 1265.2 13 1268.0 13 1319.9 11 1361.4 13 1450.1 12 1262.7 8 1161.0 8 1348.8 14 1387.5 II 1486.4 11 1265.0 7 1259.0 10 1333.2 10 1402.8 9 1486.2 13 Backtracking 0.66 0.123 0.70 0.222 0.22 0.059 0.12 0.054 99 Table 5.13: Effect of Using Fraction of Instance Space for the Small Data Set without Inheritance when the Rinott's Sampling Method is Used Table 5.14: Effect of Using Fraction of Instance Space for the Small Data Set with Inheritance when the Rinott's Sampling Method is Used Algorithm NP/NSR/K-means NP/NSR/Genetic NP/NSR/Genetic/K- means Fraction 100% 1093.7 12 50% 1106.6 13 28% 1124.0 13 4.5% 1171.1 15 03% 1371.9 19 100% 1168.8 4 50% 11743 4 28% 1166.6 5 4.5% 1194.6 5 0.5% 1298.7 14 100% 1144.2 7 50% 1161.4 7 28% 1180.8 7 4.5% 1229.9 11 0.5% 1391.0 21 Computation Time Backtracking Similarity Value 27475 26060 27878 29657 36823 31453 31091 32854 35478 40793 19722 20931 19167 18986 18130 433 0.02 522 0.04 738 0.10 397 0.02 651 0.08 984 0.06 707 0.04 1154 0.10 1435 0.10 1331 0.12 0.020 0.040 0.052 0.020 0.048 0.034 0.028 0.043 0.065 0.055 597 0.00 0.000 861 0.06 632 0.00 729 0.10 615 0.12 0.046 0.109 0.295 0.219 0.277 0.403 0.439 0.285 0.297 0.355 0.249 0.364 0.044 0.000 0.043 A lgorithm NP/NSR/K-means Inheritance S Var(Z(n)) Confidence Interval [48J . 116.6] [9.6.73.4] [16.4,87.11 [-34.8. 40.4] [154.8.246.8] [16.9,773] [-14.9.49.6] [-6.4.67.0] [69.1. 128.6] [0.6.76.7] [54.6,124.1] [-27.7.21.0] [73.1. 135.1] [123,433] [-18.5.3.2] [-13.7.9.4] [51.1. 115.5] [43.8.95.31 [463,101.7] [-26.6.14.6] [113.6.208.7] [233.74.7] [0.8.38.0] [-2.5.36.8] NP/NSR/Genetic NP/NSR/Genctic/K- means No Yes No Yes No Yes //(l2)-//<82) 41.5 252.53 :v ; 309.14 A (143)-A (286) 2.8 351.40 A(1)-/I(12) 200.8 524J5 A I(12)-/I(82) ; . 47.1 22632 1(82)-0(143) 17.3 258.18 A (82)-A (286) 30.2 334.55 //(I)-A (I2) 98.8 218.71 0(12)-0(82) 38.6 357.70 0(S2)-0(143) 89.4 299m 0(143)-0(286) -3.3 147.79 0(1)-0(12) 104.1 237.94 ^0000# 100 Table 5.15: Similarity Results of Mest with 95% Confidence Interval for the Small Data Se when the Rinott's Sampling Method is Used Z(fi) 82,5 289.49 ' r 58.63 : -7.6 29.43 -2.1 33.40 83.3 256.83 69.6 164.38 18^.79 0(143)-0(286) -6.0 105.71 0(82)-0(143) 0(82) -0(286) 0(1)-0(12) 0 (1 2 )-0 (8 2 ) 0(1)-0(12) 0(12)-0(82) 1612 560.15 49.0 163.22 in# 0(143)-0(286) 17.2 96.12 . 1 R S I giooo -S» • 100# I ». i « a is a Il M «I « Replications -sami»i 101 NP/NSR/K-means algorithm - Ml ^HI^wlOC» •03» m 45* -w* rn too* 4 11 IS a 2S 11 IS 41 Replications NP/NSR/Genetic algorithm Replications -M* # tOW -OS* m 45» w m&mm i IS a 26 u M o 4S Replications NP/NSR/Genetic/K-means algorithm Figure 5.8: Similarity Value of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Rinott Sampling Method is Used i « u is a 2* u w *i * Replications NP/NSR/Genetic -32986.4 361.5 -1401.1 5314.2 3097.5 -10087.6 -4063 «286)-«I) -510.5 «286)-«12) -1208.9 «286)-«I) 555.4 «286)-«12) 736.4 «286)— «1) 1592.1 280561626.90 1387880.96 2947921.28 NP/NSR/Genetic/K- means r(l)-r(12) «286)-«143) r(286)-r(82) « l)-« 1 2 ) r(286)-r(143) « 2 8 6 ) - « 8 2 ) «286)-«12) 3743559.20 30997419.71 88230398.80 38997985.10 3592367.01 1074139.04 740201.10 1046753.62 794362.10 [1427.1.9201.2] [-8087.7. 14282.67] [-28958.4.8783.146] [-12952.3. 12139.5] [-17142.8.6939.65] [-3291.873.24] [-1173.1.2283.82] [-1318.9.2791.9] [-198.4.3382.7] 102 Table 5.16: Computation Time Results of Mest with 95% Confidence Interval for the Small Size Data Set when the Rinott's Sampling Method is Used Algorithm NP/NSR/K-means Inheritance No Yes No Yes No Yes S r(l43)-r(82) « 1 4 3 ) - « 1 2 ) rtn —r(12) r(!43)-r(82) r(l2 )-r(8 2 ) r(l)-r(!2 ) r(286)-r(l43) Z(n) 1SÈ81 389.8 -1206.2 7794.0 -1817.7 1779.1 7166.0 9417.6 V ar(Z(n)) 464828.60 616884.40 3005681.98 845360.67 719928.10 286124.70 155167148.60 t ^ Confidence Interval [142.078,440136] [-979.88. 1759-51 [-2784.1.371.71 [4311.06. 11277] [1703.2660521 [-3664.86. 29.42] [74.5. 3483.7] [6091.41. 8240.67] [-15607.7. 34442.4] [-8701.98.42289.2] [21423.41.94703) [-66637.1.664.29] [-2005.27. 2728.27] [-4850.5. 2048.2] «286)-«82) 16793.6 "•'-sMoi'.V 332618210.10 161053385.00 - 26#1130 [1114.4,7660.2] 100000 100000 -os» i 45» a» —*—so» • 100» •05» • 45» X i -w» m ioo»i •05» • 45» *» • 10» •05» # 45» 05» I 43» Replications a» NP/NSR/Genetic algorithm •05» » 45» a»—•*—W»i# 10»| Replications Replications 103 NP/NSR/K-means algorithm NP/NSR/Genetic/K-means algorithm Figure 5.9: Computation Time of Each Algorithm with No Inheritance (Left) and Inheritance (Right) for the Small Data Set when the Rinott's Sampling Method is Used Replications 104 5.5 Conclusions Table 5.17 shows the results of all the cases that are described as an x-y plot. The Ac- axis and y-axis represent the number of replication and the similarity value. The computation time is reduced by sampling of the instances rather than using all the instances. This is more noticeable in cases of large data problem. When using only half of the instances, the computation time is decreased without affecting solution quality. In addition, the variance is decreased, which means computation time is getting stable. With too few instances, the solution quality becomes significantly worse while the computation time goes up. When sampling 4.5-50% of instances, there is a trade-off between the similarity value and the computation time. Most of the K-means algorithm runs had concave shaped plots and needed relatively less computation time than the other algorithms. This is most noticeable when the large size data set is considered. Table 5.17: Summary of all the Results (S: Similarity Value, C: Computation Time) Large VV / Vu / 50 20 30 330 20 NM NSR NM NSR K-means No Inheritance Genetic No Inheritance S: C: Genetic/ K-means S: 30 C: S: C; JO20\ Msm \ C: S: C: \ C: S: C: S \ S: \ 30any \ \ V V / / / / \143 C: \c C: \ 143 C: \ C: 12 \ •443 C; S: \ C: 143 V V Small \ 12 C; C: C: C: C: 30 30 20 20 3*) Inheritance Inheritance No Inheritance Inheritance 2ÛO 30 30 105 3» 20 150 *50 \$43 \S2 C: C: C: C: C: C: V12 143 V«2 143 \ 143 \ 12 12 \ \143 \ 143 20 30 106 Chapter 6 Evaluation of New Methodology In Chapter 2, the new concept of inheritance in the NP method is introduced and evaluated using two different scale problems. In this chapter, the importance of inheritance is evaluated further by numerically verifying that inheritance is beneficial and suggests guidelines to establish the best level of inheritance. Also the different statistical selection methods are evaluated such as Nelson-Matejcik and Rinott's two-stage sampling, which are proposed in Chapter 3 to overcome certain shortcomings of the pure NP method. Lastly, comparison results with several algorithms such as PAM, CLARA, CLARANS are reported. For numerical testing, the problem addressed in Chapter 5 is used. 6.1 Amount of Inheritance One parameter that can affect the quality and computation time of the algorithm is the amount of sample points to be inherited by the next iteration. The amount of inheritance is varied: 0, 3, 6, 10, and 20 sample points for the large data set; 0, 1,2, 4, and 8 sample points for the small data set (0, 0.5, 0.85, 1.5, and 3% of the data size, respectively). The r-test is used to compare different systems based on the difference of performance. Figure 6.1, Figure 6.2, Table 6.1, and Table 6.2 show the results of the large data set when the Nelson-Matejcik sampling method is used. fi(i) is defined as the expected similarity value of each system and r(i) is defined as the expected computation time when the amounts of inheritance are 16 {0,3,6, 10,20}. 107 For the NP/NM/K-means algorithm, numerical results show that by inheriting sample points to the next iterations, similarity values are decreased (see the confidence interval of ju(0)-/i(3) and //(3)- fi(6) in Table 6.1). However, when considering more than 10 sample points, there is not much difference in the similarity value (see the confidence interval of //(6) -//(10) and //(10) - fi(20) in Table 6.1). The computation time shows that 3 inherited sample points require the smallest computation time in both cases (see the confidence interval of r(0)-r(3) in Table 6.2). This reason can be explained by NP partitioning with inheritance and backtracking. As more sample points are inherited to the next iteration, it increases the subset size which thereby causes an increase in computation time. Between different inheritances, there could be a trade-off between the amount of backtracking caused by sampling bias and an increase in computation time caused by a larger subset. Also, the NP method can have sampling bias which can lead to an increase in backtracking without considering inheritance. As in the NP/NM/K-means algorithm, the NP/NM/Genetic algorithm, by inheriting sample points to the next iteration, makes the similarity value decrease; but with more than 3 sample points, there is no difference in similarity value (see the confidence interval of /z(0)-/z(3) in Table 6.1). However, the pattern looks a little different compared to the NP/NM/K-means algorithm. In the NP/NM/K-means algorithm, the gap (difference) between no inheritance and inheritances (except with 3 inheritances) is not clear even if confidence intervals show there is a difference. In the NP/NM/Genetic algorithm, however, this gap looks very clear, and there is not much gap between different inheritances. This can be explained by a GA search and backtracking of the NP method. If the algorithm starts with 108 randomly selected parents (without using inheritance), at any iteration, the number of backtracking is going to be increased, thereby increasing the computation time. The reason is the same as with the NP/NM/K-means algorithm. The GA search makes the population form the best two parents that are acquired from the previous iteration within the GA search. Through cross-over and mutation, the quality of these populations improves. So inheriting the initial two best samples of parents by the next iteration avoids an increase in the amount of backtracking. The NP/NM/Genetic algorithm is implemented to choose parents randomly if more than 2 samples are inherited. If not, the top 2 inherited points are considered as parents for making the population. This could degrade the quality of the solution and increase the computation time by invoking more backtracking. Figure 6.3, Figure 6.4, Table 6.3, and Table 6.4 show the results of the small data set when the Nelson-Matejcik sampling method is used, ^i(i) is defined as the expected similarity value of each system and r(t) is defined as the expected computation time when the amount of inheritance is / e {0, 1, 2, 4, 8}. Similar results is obtained from the small data set from the NP/NM/K-means algorithm. Similarity values is decreased by using inheritance, but there is no difference in similarity value when inheriting more than 2 sample points (see the confidence interval of //(l)-//(2) in Table 6.3) and 1 sample point inheritance requires the smallest computation time (see the confidence interval of r(0)- r(l) in Table 6.4). In the NP/NM/Genetic algorithm, the similarity value is decreased but there is no need for more than 1 inheritance and inheriting 1 sample point requires the smallest computation time (see the confidence interval of r(0)- r(l) in Table 6.4). The similarity NP/NM/Genetic -9.2 744» 38.5 16.6 42.9 ««4: ' -0.1 21.7 17.6 " 3 4 2 * .- • -172.8 -22.0 -75.5 64.3 253 6.7 NP/NM/Genetic/K- means 0 (0)- 0 (3) 11* 0(6)-0(10) 0(oy-0(3) 0 (3)- 0 (10) 0 (3)- 0 (20) MO^(3) 0(3)-0(6) 0 (3)- 0 (10) 754.9 a* 54.5 0(3)-0(6) 0 (3)- 0 (20) 109 Table 6.1: Similarity Results of Mest with 95% Confidence Interval for the Large Data Set when the Nelson-Matejcik Sampling Method is Used Algorithm NP/NM/K-means Instances 3(0.5%) 200(28%) 3(0.5%) 200(28%) 3(0.5%) 200(28%) S 0 (0)- 0 (3) Z(n) Var(Z(n)) 528.6 4682.23 204.2 3774.14 Confidence Interval [391.1.666.11 [80.8. 327.61 [91.4.24531 [-156.7.17.2] [638.7.871.1] [42.6,171.11 [-0.1. 109.0] [-81.2.62.7] [613.1.875.0] [-50.9. 128.0] [-73.5. 106.8] [-66.8. 152.7] [806.9.987.8] [-65.1.64.8] [-32.1.75.6] [-39.9. 75.2] [181.0.504.61 [-299.1.-46.4] [175j. 131.5] [-204.2,52.6] [448.2,656.8] [-28.9. 157.51 [-58.4. 109.0] [-82.4.95.8] K M # p ' -- 0(10)-0(20) -69.7 1875.98 3345.20 737.30 1283.28 4 2 4 9 .8 9 1986.14 2015.94 2987.78 198331 1046.80 718.92 821.72 6 4 8 3 3 4 3954.42 5845.19 4088.27 2695.56 2153.06 1737.24 1970.01 v '.' S8 iiiiI1 1 * S S I11S1 s191S S 1 t 1 -CM OK* 15» m * •01 » 05% OS* 15» m » Replications Replications 0*5% 15% Replicaoons Replications 0*5% — 13% -0% # 05% 110 NPZNMZK-means algorithm Replication! 0*5%—-—15%• NP/NM/Genetic algorithm NP/NM/Genetic/K-means algorithm Figure 6.1: Similarity Value of Each Algorithm when the Numbers of Instances are 3 (Left), and 200 (Right) of the Large Data Set when the Nelson-Matejcik Sampling Method is Used -65% 1% | NP/NM/Genetic / J ; 26044356.00 w & pêk-- •i•«»*-' NP/NM/Genetic/K- mcans 19133.1 2183J 7982.9 MU! 1869J 1631.0 7440.3 '40*03..:: :. -1882.8 -9895 3812 4203785.00 2184388JO 1606192.00 2046660.30 1109214.00 1039346.00 I9S9527J0 2373735.00 1440955.00 1500706.00 slSS®l X-126635- 69038.00 73666.00 330525.00 2999346.00 3403612.10 4491934.20 [-4795,576.2) [93.6. 1184.2] [9568.6.11878.6] [11238.7,14088.41 [10687.2. 17645.8] [2759.6. 10172.3] [-38.7.8477.1] [68421.0,88926J] [15014.0. 23252.2] [-785.9.5152.6] [5436.8. 10529.0] [77783*99521J] [-1004.8.4743.4] [-484.9. 3746.8] [-1185.0.3677.3] [37588.1,43212.6] [-4978.1.1212-3] [-3401.2.1422.0] [-2079.6.2842J] ; [47922.8.62260.7] r(20)-r(10) r(20) - r(6) r(6) - r(3) r(20) - r(10) r(IO )-r(6) r(10)- r(3) r(20)-r(10) f(20) —r(6) r(20) - r(3) r(20)-r(10) r(2 0 )-r(6 ) r(20) - r(3) 14166.0 6465.9 4219.2 1*11II* Ill Table 6.2: Computation Time Results of r-test with 95% Confidence Interval for the Large Data Set when the Nelson-Matejcik Sampling Method is Used Algorithm NP/NMZK-means Instances 3(05%) 200(28%) 3(05% ) 200(28%) 3(05% ) 200(28%) S * 2 0 )-* 1 0 ) r(20)- r(6) r ( 2 0 ) - « 3 ) r(20)- r(10) r(20) - r(6) r(6)- r(3) Z(n) 1165 159.9 7688.8 48.4 639.0 10724.0 Var(Z(/»)) 5330.06 15941.31 161148.20 Confidence Interval [-30.1.263.2] [-93.6.413.6] [6882.4.8495.4] WHÊ ÊM P 1 3 2 * H 7 6 8 J 1 -05% OâS» - 15% « 4 — j u A A| lAm—>A_
4 u 16 a 2$ 11 M 41 Replications
-05% OSS» 15% •
Replications
-05% 0*5%
——15%
NP/NM/Gcnetic algorithm 0 1%
Replications
-05% 0*5% — — 15» ‘
112
NP/NM/K-means algorithm
NP/NM/Genetic/K-means algorithm
Figure 6.2: Computation Time of Each Algorithm when the Numbers of Instances are 3 (Left), and 200 (Right) of the Large Data Set when the Neison-Matejcik Sampling Method is Used
u a 24 u u u u Replications
-05% 3*5% —•• 15%
• 1%
Replications

113
value decreases by inheriting more sample points (see the confidence interval of //(0)- //(I) in Table 6.1) but there is no need for more than 1 sample point. Moreover, 1 sample point inheritance requires the smallest computation time (see the confidence interval of r(0)- r(l) in Table 6.4).
In conclusion, sample points inherited from the previous iteration decreases the similarity value in the next iteration but no more than 0.5% inheritance for the small data set and, 0.5-0.85% inheritance for the large data set are needed. Most of the computation time is minimized when the inheritance level is 0.5% and is stabilized by using inheritance. Computation time shows different patterns that depended on what algorithms are used, not on the size of the data set as already explained above. Rinott’s sampling method shows similar
results.
6.2 Statistical Sampling
As already discussed in the previous section, there are two shortcomings of the pure NP method: the probability of success in each iteration is not guaranteed, and there may be considerable waste involved in the allocation of sample points. These can be handled by combining pure NP with a statistical sampling method (Neison-Matejcik and Rinott’s sampling). Figure 6.5, Figure 6.6, Figure 6.7, and Table 6.5 show the results of the large data set when the numbers of instances are 3, 30, 200. /v(P), /v(NM), and ^(NSR) are defined
as the expected similarity value when pure NP, NP/NM, and NP/NSR algorithms are used. Combining algorithms with the NP method are definitely better than the pure algorithm in terms of similarity value (all the confidence interval does not contain 0 in Table 6.5).

NPZNM/Gcnctic
1 8 . 2 : v : 1

1 i
11s 111
1S$

l ( U U n 21 U Htt
Replications
-0» OS»—-—15»•
t « u i i n s t u M u u Replications
4 11 16 a 26 H M «t Replications
t t U 16 a 26 11 16 41 Replications
-05»
OâS»— 16»< NP/NM/Genetic algorithm i •05» 0«5» •"15»"I m I I»: 115 NP/NM/K-means algorithm •o» # •05» ess» —-—is» i NP/NM/Genetic/K-means algorithm Figure 6.3: Similarity Value of Each Algorithm when the Numbers of Instances are 1 (Left), and 82 (Right) of the Small Data Set when the Neison-Matejcik Sampling Method is Used i -os* 6«» - ta» - t 6 II 16 a 26 11 16 41 Replications NPZNMyGcnctic NP/NM/Geneuc/k- Means r(8) - r(4) r(8) - r(2) r(2) - r(l) r(8) - r(4) r(4) - r(2) r(2) - r(l) r(8 )-r(4 ) r(8) - r(2) r(2 )-r(l) r(8)-r(4) r(8)- r(2) r(2 )-r(l) 4040.1 10929.6 3427.4 116 Table 6.4: Computation Time Results of f-test with 95% Confidence Interval of Small Data when Neison-Matejcik Sampling Method is Used Algorithm NP/NM/k-Means Instances 1(0.5%) 82(28%) 1(05%) 82(28%) 1(0.5%) 82(28%) S Z(n) r(8) - r(4) 7328.2 r ( 4 ) - r ( 2 ) 11634.0 r ( 2 ) - r ( l ) 4399.6 pËSgp IBS®! r(8) - r(4) 6693J r(4) - r(2) 7397.9 r(2) - r(l) 2717.9 ism SEE Var(Z(n)) 1263366-50 907747.23 1034573,50 H W ii 495405.40 Confidence Interval [5070.1.9586J] [9720.6. 13548.8] [2356.2.6443.1] D678&4 2268M] [5279.5.8107.5] [6252.1. 8543.7] [2IIZ7.3323.il [14850.5.17712.0] [-38.7.8119.1] [7672.6. 14186.6] [437.3.6417.6] : ' [675223.109759.5] [44.6.4093.5] 2069.0 2419.4 2648.6 Û B # : ' # 1861.6 3288.4 67X2 636.4 4044.6 1388J ' [504.7.4334.2] [1016.7.4280.6] [74233J.111003.9] [-876.Z 4599.6] [760.9.5816.0] [2045. 1139.9] [33018J. 46089.0] [-1404.2. 2677.0] [2225.8.5863J] [456.6.2320.0] ; 325284.27 90731.90 « n t t i i 4122266.20 2628330.20 2215261.90 iiio$reii»3o 1015423.60 908391.05 659898.69 83748658.00 1857355.40 1582839JO 54202.65 1031686.10 819583.70 215079.08 mo- I 6 11 If 21 26 31 16 41 46 Replications Replications oâs* —-—i» •0» i 03» 049*— 13%m1% 3«M—• 15»—1%| -03* Q«S* — NP/NM/K-means algorithm 13» « •os» m i* Replications Replications 117 NP/NM/Gcnetic algorithm NP/NM/Genetic/K-means algorithm Figure 6.4: Computation Time of Each Algorithm when the Numbers of Instances are 1 (Left), and 82 (Right) of the Small Data Set when the Neison-Matejcik Sampling Method is Used -t» ats*—-—16# « 118 However, the NP method required more computation time. Similar results are obtained from the small data set. It has been observed that combining NP algorithms are better than pure NP algorithm in terms of the similarity value. Next, comparisons between different sampling methods are observed. Figure 6.8, Figure 6.9, Figure 6.10, and Table 6.6 show the results of the similarity value for the large size data set when the numbers of instances are 30 and 200. /v(NM_30) and//(NSR_30) are defined as the expected similarity value and r(NM_30) and r(NSR_30) are defined as the expected computation time for the NP/NM and NP/NSR algorithms when 30 instances are used. /v(NM _200) and ^/(NSR_200) are defined as the expected similarity value and r(NM 200) and r(NSR_200) are defined as the expected computation time for the NP/NM and NP/NSR algorithms when 200 instances are used. Table 6.6 shows all the confidence intervals containing 0, which means there are no significant difference in similarity values between the Neison-Matejcik sampling and the Rinott's sampling method. Computation results show the same results (see the Figure 6.8, Figure 6.9, Figure 6.10, and Table 6.6). Algorithm K-means Instances 3(0.5%) 30(4.5%) 200(28%) 3(0.5%) 30(4.5%) 200(28%) 3(0.5%) 30(4.5%) 200(28%) S Z(n) p(P)-p(NM) 597.42 Var(Z(n)) 2516.087 1590.046 1942.268 3008.832 2756.138 2714.066 2693.210 1459.407 1886.153 2767.782 1952.084 2641.413 8311.132 5370542 4335522 4028.181 3448.608 9195504 Confidence Interval [496.6,698.1] [547.7,707.9] [452.8,629.8] [312.5.532.9] [455.1,666.1] [384.0.593J] [130.7,339.2] [453.8,6073] [495.7,670.2] [473.1,684.5] [434.6,612.2] [414.7,621-2] [604.9,971.2] [250.7,545.2] [174.0.438.6] [492.8.747.9] [440.4,676.4] [919.8, 1305.1] Genetic A(P)-MNSR) Ai(P)-A/(NM) A (P )-M N S R ) p(P)-p(NM ) A(P)-A(NSR) A (P )-//(N M ) A(P)-MNSR) A(P)-/<(NM) p(P)-XNSR) A(P)-A(NM) /i(P)-p(NSR) A(P)-A(NM) A(P)-A(NSR) p(P)-p(NSR) //(P )-//(N M ) /i(P)-p(NSR) 627.84 54134 422.74 560.66 488.68 23438 530.56 582.96 578.88 523.44 518.04 788.06 398.00 30632 620.40 558.46 1112.50 Genetic/K-means 119 Table 6.5: Similarity Results of /-test with 95% Confidence Interval for the Large Data Set 1 1 1 wnauKNMAf I L. 63000 6 11 IS 21 26 31 36 41 4« K-Weens » WP/MP/K-Hi Replications 4 11 It 21 2< 11 It 41 46 Replicationi >/K3*/K-neene I
«1116215611364146
Replications
UK-Keen* • » M»/m.’K»W eene
1 6 It 16 21 26 II 36 41 46
Replications
1 6 11 16 21 26 11 16 41 46 Replications
120
3(0.5%) Instances
30(4.5%) Instances
î63000
200(28%) Instances
Figure 6.5: Similarity Value of the Pure K-means and Combined K-means with the NP for
the Large Data Set
I
§1000
Y
« It 14 21 26 11 16 41 46 Replications
r*/K -*eene # M»/MMt/K-m#er#
agjMm

I
§3000
E 3 0 0 0
3(0.5%) Instances
§10 0 0
< u is a uiiit« Replications « u ;s a 26 it u u Replications R cpticattooi 121 t(UUSUUMtttf tSUUZ12SllH«ta Replications Replications 30(4.5%) instances 200(28%) Instances Figure 6.6: Similarity Value of Pure Genetic and Combined Genetic with the NP for the Large Data Set 3«ooo • A M A / ^ w I A V o-U t u :• a » n m u « Replications i t u u a a n mtt Replications Replications ere-**»•'» —rtl/Q—•i a » u M u Replications 122 3(0.5%) Instances Jwn 30(4.5%) Instances i s itisa2sn.î<«t«s 1(UU&aUUttU Replications Replications 200(28%) Instances Figure 6.7: Similarity Value of Genetic/K-means and Combined Genetic/K-means with the NP for the Large Data Set Algorithm K-means Genetic Genetic/K-means Inheritances 0(0%) 20(3%) 0(0%) 20(3%) 0(0%) 20(3%) Z(n) 825.6 4304.9 73 5402.0 r(NM_30)-r(NSR_30) 6416.4 r(NM_200)- r(NSR _200) 6134.8 r(NM_30)-r(NSR_30) -1368.7 n NM _200) - rt NSR _200) -2689.4 r(NM_30)-r(NSR_30) -7548.4 r(NM_200)- f(NSR_200) 75073 r(NM_30)-r(NSR_30) -24083 r(NM _200)- r(NSR _200) -1966.6 Confidence Interval [-2493.9.842.6] [2951.8.5657.9] [-104.7.119.5] [5042.7.5761.4] [-12943.0.25775.7] [-7998.2. 20267.81 [-5190.8.2453.4] [-7077.9. 1699.11 [-200044.0.4947.41 [-1911.2.16925.9] [-5636.8.820.1] [-5753.4508,5] //(NM _200) -//(N S R _200) /z(NM _ 30) —//(NSR _30) //( NM _ 200) - //( NSR _200) A(NM_30)-//(NSR _30) //(NM _ 200)- //(NSR _200) //(NM_30)-//(NSR_30) //(NM _200) -//(N S R _200) 34.5 4874 2 4 3 1012 -12.6 889 93.2 3145 9.9 2809 50.9 2327 5.9 1910 123 Table 6.6: Similarity Results of f-test with 95% Confidence Interval for the Large Data Set Algorithm K-means Genetic Genetic/K-means Inheritances 0(0%) 20(3%) 0(0%) 20(3%) 0(0%) 20(3%) Z(n) Var(Z(n)) -13.44 4488 -30.9 5850 173 785 2.2 1142 //(NM_30)-//(NSR_30) -70.1 4091 Confidence Interval [-148.0. 121.1] [-184J . 122.7] [-38.9.73.6] [-65.6.70.11 [-198.6.583] [-105.7. 174.8] [-39J . 88.21 [-72.5.47.2] [-19.4. 205.91 [-95J . 116.4] [-46.0. 147.81 [-81.8.93.7] S //(N M _ 3 0 )-//(N S R _ 3 0 ) //(NM _ 200)-//(NSR _ 200) //(NM_30) -//(NSR _30) A(NM_200) -//(NSR_200) Table 6.7: Computation Time Results of f-test with 95% Confidence Interval for the Large Data Set S Var(Z(n)) 689577 453621 3115 31995 93000000 49000000 3619518 4771738 39000000 22000000 2582494 1600965 r(NM _30) - r(NSR _30) r(NM_200)- r(NSR_200) r(NM_30)- r(NSR_30) r(NM_200)- r(NSR_200) I MOO i i u i t a u n x u t f Replications l t U 16 21 26 11 « *1 •«* Replications 6 111621 26 )1 1641 46 Reptations 1 4 111621 26 11 164146 Replications -W/WUK-Wee» I > *9/mjK-n
124
0(0%) Inheritance
20(3%) Inheritances
Figure 6.8: Similarity Value of K-means when then Numbers of Instances are 30(Left) and
200(right) for the Large Data Set

§3000
sooo
1000
6U1621261116tt«6 ” 1= 1 J=l*2 Afff(i) +1. i=l re/J
if M<7,(*)) < LA_,(<7y(*)) then Xtj(k)= Lh«Tj(k)) Step 3-2-4. If h = nk continue to Step 3-3. Otherwise let h = h + 1 and go back to Step 3-2-2. Step 3-2-4. Change the Center of each Subregion Change the centers of the value of the features > d(<7(k)) for each cluster of each subregion and back to Step 3-2-2. Step 3-3. If t = n0 continue to Step 4. Otherwise let u = u +1 and go back to Step 3-2-2. Stage II Sampling Step 4. Estimating Mean and Variance of First-Stage Sampling Compute the approximate sample variance of the difference of the sample means (t-lK /1 ,-1 ) WhereX.^X Jk, X.j^Xjn,, andX=2^, Step 5. Computing Total Sample Size for Second-Stage Sampling Computethetotalsamplesizeforall j 141 ComputethetotalsamplesizeN(k)=max "o»(f) for y= l,2, Step 6. Second-Stage Sampling Obtain N{k)-n^ more sample estimates of the system performance for all y as in Step 3 above. Step 7. Estimating Mean of Second-Stage Sampling Let the overall sample mean be the promising index for all y e / , I(2, the number of sub-regions M. Determine h2 for Rinott’s integral. h2 are constants which are determined by n0, the minimum probability P* of correct
selection, and M (See the tables in Bechhofer et al., 1995). Step 2. Partitioning
See Step 2 in Algorithm NP/NM/K-means Stage I Sampling
Step 3. First-Stage Sampling
See Step 3 in Algorithm NP/NM/K-means
Stage II Sampling
Step 4. Estimating Mean and Variance of First-Stage Sampling
Calculate first-stage sample means and variances
I

148
Step 5. Computing Total Sample Size for Second-Stage Sampling Compute the total sample size for all ye /
n0+l,
e
is a constant determined by n0 and the minimum probability P’ of correct selection.
Step 6. Second-Stage Sampling
See Step 6 in Algorithm NP/NM/K-means
Step 7. Estimating Mean of Second-Stage Sampling See Step 7 in Algorithm NP/NM/K-means
Step 8. Calculating the Promising Index
See Step 8 in Algorithm NP/NM/K-means
Step 9. Checking the Stopping Rule
See Step 9 in Algorithm NP/NM/K-means
where e is the indifference zone and
“Ï 1

149
BIBLOGRAPHY
Aarts, E. and Korst, J. 1989. “Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing,” Wiley-Interscience series in discrete mathematics and optimization. John Wiley and Sons, Inc., New York, NY.
Agrawal, R. and Srikant, R. 1994. “Fast algorithms for mining association rules,” in Proceeding of the 20th Int’l Conf on Very Large Databases, Santiago, Chile.
Al-Sultan, K.S. 1995. “A tabu search approach to the clustering problem,” Pattern recognition, 28(9), 1443-1451.
Al-Sultan, K.S and Khan, M M. 1996. “Computational experience on four algorithms for the hard clustering problem,” Pattern recognition Letters, 17,295-308.
Andradôttir, S. 1989. “A review of Simulation Optimization Techniques,” in Proceedings of the Winter Simulation Conference, 151-158.
Andradôttir, S. 1990. “A new algorithm for stochastic optimization,” in Proceedings of the Winter Simulation Conference, 364-366.
Andradôttir, S. 1995. “A method for discrete stochastic optimization,” Management Science, 41, 1946-1961.
Andradôttir, S. 1996. “A global search method for discrete stochastic optimization,” SIAM /. Optimization, 6,513-530.

150
Andradôttir, S. 1998. “Simulation Optimization Techniques,” in J. Banks (éd.), Handbook of Simulation, 307-333.
Azadivar, F., Shu, J., and Ahmad, M. 1996. “Simulation optimization in strategic location of semi-finished products in a pull-type production system,” in Proceedings of the Winter Simulation Conference, 1123-1128.
Back, T . 1994. “Selective pressure in evolutionary algorithms: a characterization of selection mechanisms,” in Proceedings of the 1″ IEEE Conference on Evolutionary Computation, 57- 62, IEEE Press, Piscataway NJ.
Babu, G.P. and Murty, M.N. 1993. “A near optimal initial seed value selection in K-means algorithm using a genetic algorithm,” Pattern Recognition Letters, 14(10) 763-769.
Ball, G.H. and Hall, D.J. 1964. “Some fundamental concepts and synthesis procedure for pattern recognition preprocessors,” In International Conference on Microwaves, Circuit Theory, and Information theory.
Banks, J. 1998. Handbook of Simulation, Wiley-Interscience, New York.
Banks, J. 1999. “Introduction to simulation,” in Proceedings of the Winter Simulation Conference, 7-13.
Bechhofer, R E. 1954. “A single-sample multiple decision procedure for ranking means of normal populations with known variances,” Annals of Mathematical Statistics, 25, 16-39.
Bechhofer, R.E., Santner, T ., and Goldsman, D. 1995. Design and analysis of experiments for statistical selection, screening, and multiple comparisons, Wiley-Interscience, New York.

151
Bechhofer, R.E., Kiefer, J., and Soble, M. 1968. Sequential Identification and Ranking Procedures, The University of Chicago Press, Chicago, Illinois.
Bnnett, K.P. 1994. “Global Tree Optimization: A Non-greedy Decision Tree Algorithm,” Computing Science and Statistics, 26, 156-160.
Berger, J O. 1988. “Bayesian approach to ranking and selection of related means with alternatives to analysis-of-variance methodology,” Journal of the American Statistical Association,83,402, 364-373.
Blake, C.L . and Merz, C .J. 1998. UCI Repository of machine learning database [http://www.ics.uci.edu/mleam/MLRepository.html]. Department of Information and Computer Science, University of California, Irvine, CA.
Boley, D., Gini, M., Gross, R., Han, E.H., Hastings, K., Karypis, G., Kumar. V., Mobasher, B., and Moore, J. 1999. “Document categorization and query generation on the world wide web using WebACE,” AI Review.
Bos, S. 1996. “A realizable learning task which exhibits overfitting,” Advances in Neural Information Processing Systems,8, 218-224, MIT Press, Cambridge, MA.
Bradley, P., Fayyad, U„ and Mangasarian, O. 1998. “Data Mining: Overview and Optimization Opportunities,” INFORMS: Journal of Computing.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C J. 1984 Classification ana Regression Trees, Monterey, CA: Wadsworth and Brooks/ Cole.
Byers, S. and Adrian, E. R. 1998. “Nearest neighbor clutter removal for estimating features in spatial point processes,” Journal of the American Statistical Association, 93, 577-584.

152
Carson, Y., and Maria» A. 1997. “Simulation Optimization: Methods and Applications,” in Proceedings of the Winter Simulation Conference, 118-126.
Chen, C.-H., E.Yucesan, Y.Yuan, H.-C.Chen, and L.Dai. 1998. “Computing budget allocation for simulation experiments with different system structures,” in Proceedings of the Winter Simulation Conference, 735-741.
Chen, C.-H., Chen, H.-C, and Dai, L., 1996. “A gradient approach for smartly allocating computing budget for discrete event simulation,” in Proceedings of the Winter Simulation Conference, 398-405.
Chick, S.E. and Inoue, K. 1998. “A Sequential allocation procedures that reduce risk for multiple comparisons,” in Proceedings of the Winter Simulation Conference, 669-676.
Chick, S.E. and Inoue, K. 1999. “New two-stage and sequential procedures for selecting the best simulated systems,” Technical report, University of Michigan, Ann Arbor, Dept. of Industrial & Operations Engineering.
Clark, G.M. and Yang. W.N. 1986. “A Bonferroni selection procedure when using common random numbers with unknown variances,” in Proceedings of the Winter Simulation Conference, 372-375.
David, L. 1991. Handbook of genetic algorithms,Van Nostrand Reinhold, New York.
Deogun, J. S., Raghavan, V. V., Sarkar, A., and Server H. 1997. “Data mining: Research trends, challenges, and applications,” in Roughs Sets and Data Mining: Analysis of Imprecise Data (Boston, MA), T . Y . Lin and N. Cercone, Eds., Kluwer Academic Publishers, 9-45.

153
Donohue, J.M., Houck, E.C., and Myers, R.H. 1990. “Some optimal simulation designs for estimating quadratic response surface functions,” in Proceedings of the Winter Simulation Conference, 337-343.
Duda, R. and Hart, P. 1973. Pattern Classification and Scene Analysis, John Wiley and Sons.
Dudewicz, E.Z. and Taneja, V S. 1978. “Multivariate ranking and selection without reduction to a univariate problem,” in Proceedings of the Winter Simulation Conference, 207-210.
Dudewicz, E.J. and Dalai, S R. 1975. “Allocation of Observations in Ranking and Selection with Unequal Variances,” Sankhya,37B, 28-78.
Dudewicz, E.J., and Zaino. N.A. 1976 “Allowance for correlation in setting simulation run length via ranking and selection procedures,” Technical Report. Department of Statistics, Stanford University
Dunnett, C. W. 1989. “Multivariate normal probability integrals with product correlation structure,” Applied Statistics, 38, 564-579. Correction: 42, 709.
Faccenda J.F., and Tenga, R.F. 1992. “A combined simulation/optimization approach to process plant design,” in Proceedings of the Winter Simulation Conference, 1256-1261.
Fleischer, M. 1995. “Simulated annealing: past, present, and future,” in Proceedings of the Winter Simulation Conference, 155-161.
Fraley, C., and Raftery, A. 1998. “How many clusters? Which clustering method? Answers via Model-Based Cluster Analysis,” Technical Report No. 329, Dept. of Statistics, University of Washington.

154
Frawely, W. J. Piatesky-Shapiro, G. and Matheus, C. J. 1991, “Knowledge discovery in databases: an overview,” in Knowledge Discovery in Databases, AAAI/MTT Press, 1-27.
Fu, M.C., and Healy, K. 1992. “Simulation Optimization of (s,S) Inventory Systems,” in Proceedings of the Winter Simulation Conference,506-514.
Fu, M.C. 1994. “Stochastic Optimization using Simulation: A review,” Annals of Operations Research, 53, 506-514.
Gen, M., and R, Cheng. 2000. Genetic algorithms and engineering optimization, Wiley, New Y ork.
Glasserman, P. and Yao, D.D. 1992. “Some Guidelines and Guarantees for Common Random Numbers,” Management Science, 38, 884-908,
Glover, F. 1977. “Heuristics for integer programming using surrogate constraints,” Decision Sciences,8, 156-166.
Glynn, P.W. 1989. “Optimization of Stochastic Systems Via Simulation,” in Proceedings of the Winter Simulation Conference,90-105.
Goldberg, D., Korb, B., and Deb, K. 1989. “Messy genetic algorithms: motivation, analysis, and first results,” Complex Systems,3,493-530.
Goldsman, D., and Nelson. B.L. 1994. “Ranking, selection, and multiple comparisons in computer simulation,” in Proceedings of the Winter Simulation Conference, 192-199.

155
Goldsman, D., and Nelson, B.L. 1998. “Statistical screening, selection, and multiple comparison procedures in computer simulation,” in Proceedings of the Winter Simulation Conference, 159-166.
Goldsman, D., Nelson, B.L., and Schmeiser, B. 1997. “Methods for selecting the best system,” in Proceedings of the Winter Simulation Conference, 177-186.
Gong, D., Gen, M., Yamazaki, G., and Xu, W. 1997. “Hybrid evolutionary method for capacitated location-allocation problem,” Engineering design and Optimization, 3, 179-190.
Gupta, S.S. 1965. “On some multiple decision rules,” Technometrics,6, 225-245.
Gupta, S. S. and Miescke. K. J. 1996. “Bayesian look ahead one-stage sampling allocations for selecting the best population,” Journal of Statistical Planning and Inference, 54, 229- 244.
Gurkan, G., Ozge, A.Y., and Robinson, S.M. 1994. “Sample-path optimization in simulation,” in Proceedings of the Winter Simulation Conference,247-254.
Han, J. and Kamber, M. 2001. Data mining: Conceptsand Techniques, Morgan Kaufmann.
Harris, N., Hunter, L, and States, D. 1992. “Mega-classification: Discovering motifs in massive datastreams,” in Proceedings of the Tenth International Conference on Artificial Intelligence (AAAI).
Hertz, J. Krogh, A., and Palmer, R.D. 1991. Introduction to the theory of neural computation Reading, Addison-Wesley.

156
Hochberg, Y., and Tamhane. A. C. 1987. Multiple Comparison Procedures, NewYork: John Wiley.
Hofmann, T. and Buhmann, J. 1997. “Pairwise data clustering by deterministic annealing,” IEEE Trans. Pattern Anal. Mach. Intell.,1, 1-14.
Holland, J.H. 1975. Adaptation in natural and artificial systems, MTT Press.
Hill, R.R. 1999. “A monte carlo study of genetic algorithm initial population generation methods,” in Proceedings of the Winter Simulation Conference, 543-547.
Inoue, K. and Chick, S.E., and Chen, C.-H. 1999. “An Empirical Evaluation of Several Methods to Select the Best System,” ACM Transitions on Modeling and Computer Simulation,9,381- 407.
Jacobson, S.H. and Schruben. L.W. 1989. “A review of techniques for simulation optimization,” Operations Research Letters, 8, 1-9.
Jain, A. K. and Dubes, R.C. 1988. Algorithms for clustering data, Prentice hall.
Jones, DR. and Beltramo, M.A. 1991. “Solving partitioning problems with genetic algorithms,” in Proceedings of the fourth International Conference on Genetic Algorithms, 442-449.
Karypis, G. and Kumar, V. 1998. “METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system,” Technical Report, Department of computer Science, University of Minnesota.

157
Karypis, G. Han, E. H., and Kumar, V. 1999. “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling,” COMPUTER, 32, 68-75.
Kaufman, L. and Rousseeuw, P.J. 1990. Finding Groups in Data: an Introduction to Cluster analysis, Academic Press.
Kirkpatrick, S., Gelatt, CD. Jr., and Vecchi. MP. 1983. “Optimization by Simulated Annealing,” Science, 220,4598,671-680.
Koeing, L.W. and A M. Law. 1985. “A procedure for selecting a subset of size m containing the 1 best of 1 independent normal populations, with applications to simulation,” Communications in Statistics, B14(3), 719-734.
Law, A.M. and Kelton, W.D. 1991. Simulation Modeling & Analysis(2″d ed), McGrawHill, Inc, New York.
LEcuyer, P. 1991. “An Overview of Derivative Estimation,” in Proceedings of the Winter Simulation Conference, 207-217.
L’Ecuyer, P. and Perron, G. 1994. “On the Convergence Rates of IPA and FDC Derivative Estimators for Finite-Horizon Stochastic Simulations,” Operations Research, 42(4), 643-656.
Lee, Y.H., Park, K.J., and Kim, T.K. 1999. “An approach for finding discrete variable design alternatives using a simulation optimization method,” in Proceedings of the Winter Simulation Conference, 678 – 685.
Leung, Y.T., and Suri, R. 1990. “Finite – Time Behavior of two simulation optimization algorithms,” in Proceedings of the Winter Simulation Conference, 372-376.

158
Lu, S.Y. and Fu, K.S. 1978. “A sentence-to-sentence clustering procedure for pattern analysis,” IEEE Trans. Syst. Man Cybern,8, 381-389.
Man, K.F. 1999. Genetic Algorithms: Concepts and Designs, Springer, New York.
Mitchell, T.M. 1997. Machine Learning, McGrawHill, Inc.
Neal, R. M. Hinton, G. E. 1999. “A view of the EM algorithm that justifies incremental, sparse, and other variants,” Learning in Graphical Models, 355-368, Cambridge, MA: MIT Press.
Nelson, B.L., and Matejcik, F.J. 1995. “Using common random numbers for indifference- zone selection and multiple comparisons in simulation,” Management Science, 41, 1935- 1945.
Nelson, B.L. 1993. “Robust multiple comparisons under common random numbers,” ACM Transactions on Modeling and Computer Simulation, 3,225-243.
Ng, R. and Han, J. 1994. “Efficient and effective clustering methods for spatial data mining,” In Proceedings ofVLDB Conference, 144-155.
Ôlafsson, S . 1999. “Iterative ranking-and-selection for large-scale optimization,” in Proceedings of the Winter Simulation Conference,479 -485.
Ôlafsson, S., and Gopinath, N. 2000. “Optimal selection probability in the two-stage nested partitions method for simulation-based optimization,” in Proceedings of the W inter Simulation Conference, 736-742.

159
Paulson, E. 1964. “A sequential procedure for selecting the population with the largest mean from k normal populations,” Annals of Mathematical Statistics, 35, 174-180.
Piatetsky-Shapiro, G. and Frawley, W.J. 1991. Knowledge Discovery in Databases, AAAI/MTT Press.
Raghavan, V. V. and Birchand, K. 1979. “A clustering strategy based on a formalism of the reproductive process in a natural system,” in Proceedings of the Second International Conference on Information Storage and Retrieval, 10-22.
Rinott, Y. 1978. “On two-stage selection procedures and related probability-in-equalities,” Communications in Statistics, A l, 799-811.
Robinson, H., and Monro, S. 1951. “A stochastic approximation method,” Annals of Mathematical Statistics, 22,400-407.
Rose, K., Gurewitz, E., and Fox, G.C. 1993. “Deterministic annealing approach to constrained clustering,” IEEE Trans. Pattern Anal. Mach. Intell,15,785-794.
Rubinstein, R..Y. and Shapiro, A. 1993, Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization, John Wiley & Sons Inc.
Sanchez, S.M. 1997. “It is a far, far better mean I find…,” in Proceedings of the Winter Simulation Conference, 31-38.
Shapiro, A. 1996. “Simulation based optimization,” in Proceedings of the Winter Simulation Conference, 332-336.
Shavlik,J.W.andDietterich,T.G.1990. ReadingsinMachineLearning.MorganKaufmann.

160
Shi, L., Ôlafsson, S., and Chen, Q. 1999. “A new hybrid optimization algorithm,” Computers & Industrial Engineering, 36,409-426.
Shi, L., and Ôlafsson, S. 2000a. “Nested Partitions Method for Global Optimization,” Operations Research, 48, 390-407.
Shi, L., and Ôlafsson, S. 2000b. “Nested Partitions Method for Stochastic Optimization,” Methodology and computing in Applied Probability, 2,271-291.
Sullivan, D.W., and Wilson, JR. 1989. “Restricted subset selection procedures for simulation,” Operations Research, 37, 52-71.
Sudipto, Guha, Rajeev, Rastogi, and Kyuseok Shim. 1998. “CURE: An efficient clustering algorithm for large databases,” in Proceeding of 1998 ACM-SIGMOD Int. Conference on Management of Data.
Sudipto, Guha, Rajeev, Rastogi, and Kyuseok, Shim. 1999. “ROCK: a robust clustering algorithm for categorical attributes,” in Proceedings of the15th Int’l Conference on Data
Eng., 512-521.
Swisher, J R., and Jacobson, S.H. 1999. “A survey of ranking, selection, and multiple comparison procedures for discrete-event simulation,” in Proceedings of the Winter Simulation Conference, 492-501.
Wen, M.J. and H.J. Chen. 1994. “Single-stage multiple comparison procedures under heteroscedasticity,” American Journal of Mathematical and Management Sciences, 14(1,2), 1-48.

161
ACKNOWLEDGEMENTS
I would like to especially thank to Dr. Olafsson, my major professor and Sameh, my friend for their loving guidance and financial assistance during the writing of this work.