FACULTY OF SCIENCE
AND TECHNOLOGY
MSc. Applied Data Analytics
June 2016
Learning Deep Structured Network for Identification of
Mixed Patterns in Semiconductor Wafer Maps
by
Van Hoa Trinh
DISSERTATION DECLARATION
This Dissertation/Project Report is submitted in partial fulfilment of the requirements
for a Masters degree at Bournemouth University. I declare that this Dissertation/
Project Report is my own work and that it does not contravene any academic offence
as specified in the University’s regulations.
Retention
I agree that, should the University wish to retain it for reference purposes, a copy of
my Dissertation/Project Report may be held by Bournemouth University normally for
a period of 3 academic years. I understand that my Dissertation/Project Report may
be destroyed once the retention period has expired. I am also aware that the University
does not guarantee to retain this Dissertation/Project Report for any length of time (if
at all) and that I have been advised to retain a copy for my future reference.
Confidentiality
I confirm that this Dissertation/Project Report does not contain information of a com-
mercial or confidential nature or include personal information other than that which
would normally be in the public domain unless the relevant permissions have been ob-
tained. In particular any information which identifies a particular individuals religious
or political beliefs, information relating to their health, ethnicity, criminal history or
personal life has been anonymised unless permission for its publication has been granted
from the person to whom it relates.
Copyright
The copyright for this dissertation remains with me.
Requests for Information
I agree that this Dissertation/Project Report may be made available as the result of a
request for information under the Freedom of Information Act.
Signed:
Name: Van Hoa Trinh
Date: 30/06/2016
Programme: Msc. Applied Data Analytics
i
Abstract
Wafer defect detection has been the focal research in wafer manufacturing industry. A
big gap on research of identification of mixed defect patterns on semiconductor wafers is
the main motivation for this thesis. This dissertation illustrates the design and imple-
mentation of wafer map defect detection based on deep convolutional neural networks.
It is the first research to test the performance of deep learning model on mixed de-
fect pattern recognition. The thesis starts with a literature review of defect detection
processes with various machine learning methods recently used by researchers and the
shortcomings of these methods. It then describes a detailed review of convolutional neu-
ral networks and proposes an appropriate parameter for wafer defect detection, of which
the main part is implemented. The experimental results are discussed in this thesis and
justified based on a comprehensive model selection performed. All experiments were run
on an operating system of Windows 7 Professional 64-bit (6.1, Build 7601), processor of
Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz (4 CPUs), 3.5GHz and installed memory
of 16GB. A defect detection test accuracy of 79.6%
Acknowledgements
I would like to express my deep gratitude to my supervisor Dr. Paul Yoo. His great
guidance and very insightful suggestions and comments have enabled me to write up
this study. Finally yet importantly, I extend my gratefulness to my beloved family and
my darling for being my moral support and everything they have done to me. I could
never be what I am today without them.
iii
Contents
Abstract ii
Acknowledgements iii
List of Figures vii
List of Tables ix
Abbreviations x
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Structure of The Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Identification of Mixed Defect Pattern Classification in Semiconductor
Wafer Maps 8
2.1 Defect Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Generated Wafer Map . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1.1 Statistical Models . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1.2 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 De-noising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Spatial Randomness Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6.1 Traditional Machine Learning Techniques . . . . . . . . . . . . . . 20
2.6.2 Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Root Case Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Machine Learning In Identification On Defect Pattern In Semiconductor
Wafers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Learning Deep Structured Convolutional Neural Network 28
iv
Contents
3.1 Wafer Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Tools And Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Convolutional Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Local Receptive Fields . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Feature Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Weight Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.4 Estimation of the Feature Map Volume . . . . . . . . . . . . . . . 37
3.3.5 Experimental Design of Convolutional Layers . . . . . . . . . . . . 39
3.4 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 Background of Activation Functions . . . . . . . . . . . . . . . . . 40
3.4.2 Experimental Design of Activation Functions . . . . . . . . . . . . 41
3.5 Pooling Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.1 Background of Pooling Layers . . . . . . . . . . . . . . . . . . . . . 41
3.5.2 Experimental Design Of Pooling Layer . . . . . . . . . . . . . . . . 42
3.6 Fully-Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Gradient Descent Optimisation Algorithms . . . . . . . . . . . . . . . . . 43
3.7.1 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 45
3.7.3 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7.4 Adagrad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7.5 Adadelta And RMSprop . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7.6 Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7.7 Experimental Design of Gradient Descent Optimisation Algorithms 50
3.8 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Performance Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 51
3.9.1 Mean Squared Error And Model Accuracy . . . . . . . . . . . . . . 51
3.9.2 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.9.3 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . 52
3.9.4 ROC Curve (Receiver Operating Characteristics) . . . . . . . . . . 52
3.10 Stratified K-fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . 55
3.11 Experiment and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.11.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.11.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.11.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . 60
3.11.2.2 Fully-Connected Layer . . . . . . . . . . . . . . . . . . . 63
3.11.2.3 Gradient Descent Optimisation Algorithms . . . . . . . . 64
3.11.2.4 De-Noising Effect . . . . . . . . . . . . . . . . . . . . . . 65
3.11.2.5 Performance Comparison of CNNs Against Other Shal-
low Learning Networks . . . . . . . . . . . . . . . . . . . 68
3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4 Conclusion and Future Work 72
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Review of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
v
Contents
A CNN’s Python Code 75
B Matlab’s De-Noising Code 80
Bibliography 82
vi
List of Figures
1.1 Wafer manufacturing process (Kang et al., 2015) . . . . . . . . . . . . . . 1
1.2 A finished wafer contains hundred squares representing for a chip (Geng,
2005) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Packaged microprocessors (Geng, 2005) . . . . . . . . . . . . . . . . . . . 2
1.4 Root cause determination (Imai et al., 2010) . . . . . . . . . . . . . . . . . 4
1.5 Scope of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Process of defect pattern analysis . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Example of Chao et al.’s generated defect patterns: Cluster pattern with
intensity of 90% and 50-100 defective chips . . . . . . . . . . . . . . . . . 11
2.3 Choi’s generated defect patterns . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Jeong et al. (2008)’s simulated data . . . . . . . . . . . . . . . . . . . . . 14
2.6 Example of using the spatial filter with size of 3×3 . . . . . . . . . . . . . 15
2.7 Median-filtering technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 (a) Example of wafer maps. (b) Radon transform output. (c) Radon-
based attributes Rϕ. (d) Radon-based attributes Rσ (Wu et al., 2015) . . 18
2.9 Rotation moment invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.10 Summary of some algorithms in deep and shallow nets (Ranzato, 2014) . 21
2.11 Deep and shallow learning network . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Circle and spot mixed patterns . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Circle and scratch mixed patterns . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Cluster and scratch mixed patterns . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Cluster and circle mixed patterns . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 A typical structure of CNNs (Lecun et al., 1998) . . . . . . . . . . . . . . 32
3.6 Local receptive field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Output image when using different weights in local receptive fields . . . . 34
3.8 Convolution operation in CNNs . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Local receptive field’s movement at stride = 2 . . . . . . . . . . . . . . . . 36
3.10 Feature map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.11 Weight sharing (Le, 2015) . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.12 Convolution with 3×3 Filter (Karpathy, 2015) . . . . . . . . . . . . . . . . 38
3.13 Sigmoid, tanh and RELU function . . . . . . . . . . . . . . . . . . . . . . 40
3.14 Max pooling and average pooling . . . . . . . . . . . . . . . . . . . . . . . 42
3.15 Stochastic gradient descent algorithm (Ian Goodfellow and Courville, 2016) 46
3.16 Stochastic gradient descent algorithm with momentum (Ian Goodfellow
and Courville, 2016) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
vii
List of Figures
3.17 Adagrad algorithm (Ian Goodfellow and Courville, 2016) . . . . . . . . . . 48
3.18 RMSprop and Adadelta algorithm (Ian Goodfellow and Courville, 2016) . 48
3.19 Adam algorithm (Ian Goodfellow and Courville, 2016) . . . . . . . . . . . 49
3.20 Example of dropout neural network . . . . . . . . . . . . . . . . . . . . . . 51
3.21 Receiver Operating Characteristics . . . . . . . . . . . . . . . . . . . . . . 54
3.22 Relationship between AUC and diagnostic accuracy . . . . . . . . . . . . 55
3.23 Performance comparison in terms of pooling layers . . . . . . . . . . . . . 63
3.24 Performance comparison in terms of stride . . . . . . . . . . . . . . . . . . 63
3.25 Performance comparison corresponding to the value of dropout . . . . . . 64
3.26 Performance comparison corresponding to gradient descent optimisation
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.27 Effect of de-noising techniques on Circle and spot mixed patterns: (A)
Initial image input, (B) Effect of Median Filter, (C) Effect of Averaging
Filter, and (D) Effect of Adaptive filtering . . . . . . . . . . . . . . . . . . 66
3.28 Example of inefficiency of de-noising techniques on complex Circle and
spot mixed patterns: (A) Initial image input, (B) Effect of Median Filter,
(C) Effect of Averaging Filter, and (D) Effect of Adaptive filtering . . . . 67
viii
List of Tables
1.1 Examples of root causes of defect patterns on wafer . . . . . . . . . . . . . 5
2.1 Process summary of various methods. . . . . . . . . . . . . . . . . . . . . 23
3.1 Example of the ROC curve . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Experimental design in terms of Convolutional layers . . . . . . . . . . . . 56
3.3 Experimental design in terms of fully-connected layers . . . . . . . . . . . 60
3.4 Sample architecture of 3C-1P-0.25D-1F-0.5D-1F-1111S-3332K . . . . . . 61
3.5 Experimental results for various convolutional layer’s architectures . . . . 62
3.6 Model performance corresponding to stride value . . . . . . . . . . . . . . 62
3.7 Experimental results for various architectures of fully-connected layer . . 64
3.8 Experimental results corresponding to gradient descent optimisation al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.9 Experimental results of de-noising techniques . . . . . . . . . . . . . . . . 67
3.10 The proposed CNN architecture of 2C-1MP-1F-0.25D-1F-111S-334K . . . 68
3.11 Models’ parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.12 Performance comparison between CNNs and other traditional machine
learning networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
ix
Abbreviations
ANN Artificial Neural Network
AUC Area Under Curve
CNN Convolutional Neural Network
CP Circuit Probe
DBN Deep Belief Network
GRNN General Regression Neural Network
HPP Homogeneous Poisson Process
IC Internal Circuit
kNN K Nearest Neighbour
LNPP Local and Nonlocal Preserving Project
LOR Log and Odds Ratio
MLP MultiLayer Perceptron
MRF Markov Random Field
MSE Mean Squared Error
MVN Multi Variate Bernoulli Process
NMF Non-negative Matrix Factorization
PCA Principal Component Analysis
PFT Polar Fourier Transform
PNN Probabilistic Neural Network
RBF Radical Basis Function
ReLU Rectified Linear Unit
RMI Rotation Moment Invariant
ROC Receiver Operating Characteristics
SGD Stochastic Gradient Descent
SHBP Spatially Homogeneous Normal
x
SRT Spatial Randomness Test
SVD Singular Value Decomposition
SVM Support Vector Machine
TBM Time to Build Model
WBM Wafer Bin Map
xi
Chapter 1
Introduction
1.1 Introduction
A manufacturing process can be defined as a procedure of converting unprocessed mate-
rial into finished goods. In semiconductor manufacturing industry, semiconductor mate-
rials such as Silicon, Zinc oxide, do-pants, and insulators (Prasad et al., 2013) are used
to produce finished products including integrated circuits or integrated circuit packages.
Production of semiconductor devices involves a lengthy and complex process and takes
weeks to complete (Chou et al., 1997). In overall, the process can be divided into four
main following steps (Peleg, 2004; Uzsoy et al., 1994) (Figure 1.1) :
1. Fabrication: Identical IC chips or dies are fabricated in batches on a thin slice
of crystalline silicon which is called wafer. Each wafer may contain many chips
(Figure 1.2). The recent trend of wafer manufacturing is to create a smaller size
of chips (i.e. 65nm) in a larger size of the wafer (i.e. 450mm) (Sonderman, 2011).
Figure 1.1: Wafer manufacturing process (Kang et al., 2015)
1
Figure 1.2: A finished wafer con-
tains hundred squares representing for
a chip (Geng, 2005)
Figure 1.3: Packaged microproces-
sors (Geng, 2005)
2. Wafer test or Circuit probe (CP) test: After the first process, not all the IC
dies work well so each IC die is evaluated whether it works properly or not before
the wafer is cut by a diamond saw into individual chips. A defective IC die is
marked as a spot of ink and a wafer bin map (WBM) is created (Geng, 2005). A
wafer which contains more than a threshold of defective chips is then discarded.
3. Assembly: Only non-defective ICs proceed to the next step. They are then
inserted into a protective package to protect the chips from high humidity (Geng,
2005; Rabaey and Nikolic, 2002) and its fragile wire bonds (Geng, 2005). The type
of package is determined by how they will be used and what type of microprocessor
is (Figure 1.3).
4. Final test: Finally, the IC chips are tested again by a functional test to prevent
any possible faults in the packing process. The final test is to evaluate whether
ICs could perform well in hot and cold environments or not (Kang et al., 2015).
In semiconductor manufacturing industries, the three fundamental targets of manufac-
turing are to produce wafers that have the following characteristics (Gary S. May, 2006):
• Low cost. Yield and throughput are the key factors in terms of cost reduction.
Yield rate could be defined as a proportion of products on the wafer found to work
properly while throughput is the number of wafers through a machine per hour.
High yield rate and throughput result in lower cost.
• High quality. High quality must come from a sTable and reliable manufacturing
process. Products should be produced uniformly and efficiently in large quantities.
• High reliability. Manufacturing faults should be reduced to increase the degree
of reliability.
2
It is clear that the most important factor in semiconductor manufacturing is yield rate.
As mentioned above, the wafer is cut into many individual chips. Hence, the manu-
facturers’ top priority is to have the highest number of chips that can be made from a
single wafer (Kenneth A. Jackson, 2008). For that reason, almost all of semiconductor
manufacturing companies pay serious attention on defect pattern on wafer in order to
find the root cause of the defect and then improve the manufacturing process. In the
past, defect patterns were identified by manual inspection. However, this method is time
consuming and human experts have a limited capability to identify the defect patterns
among a huge amount of wafer maps with high accuracy. As a result, automatic detec-
tions were developed by many semiconductor manufacturing companies; as an attempt
to reduce cost and increase yield rate.
1.2 Motivation
There are many reasons why a defect might occur. Yuan et al. (2010) pointed out
that spatial defect patterns are separated into two main groups: global/random defects,
which are caused by random causes, and local/systematic defects, which are caused by
assignable causes.
Local defect pattern can be divided into single pattern and mixed pattern (Figure 1.5).
A single pattern could have many defect shapes, i.e. centre, doughnut, ring or spot. A
mixed pattern is a combination of two single patterns such as ring+spot or cluster+ring.
Local defects are normally generated by failing equipments, for example, chemical stains,
micron-scale particles from manufacturing equipments and human mistakes, etc. Local
defects create distinguishable patterns on the specific position of wafer surface and they
can be detected easily by using detection and classification processes on wafer maps
(Tan and Lau, 2011).
In contrast to local defects, global defects are generated by normal equipments (Imai
et al., 2010), including air quality in manufacturing room, variation in heating or depo-
sition (Yuan et al., 2010). They create randomly defect positions spreading in any area
of the wafer. In order to mitigate the global defects, it could take a long time to clean
room operation protocols or fixing equipments.
3
Figure 1.4: Root cause determination (Imai et al., 2010)
Figure 1.5: Scope of the thesis
In addition, the various defect patterns in the wafer map provide crucial information
that could help manufacturing companies determine the root causes of the fabrication
problems. To be more specific, Figure 1.4 shows defect occurrence model (Imai et al.,
2010) that illustrates how a local defect is created. The upper-right area and the lower-
left corner defect patterns are generated by the failure of Equipment 1 of Process X and
Equipment 2 of Process Z respectively. The other random defect patterns are created
by normal equipments. Based on this defect pattern, quality engineer could know which
equipments (i.e. equipment 1 and 2) and processes (i.e. process X and Z) are failing.
Because the local defect pattern can be visualised using wafer map and based on this
information, the root cause can be determined easily. Table 1.1 illustrates some examples
of root causes of defect pattern. Identifying the root causes could then increase the yield
rate and reduce cost per die (Chen and Liu, 2000).
Identification of defect patterns in semiconductor wafer had led to many machine learn-
ing researchers to find the most appropriate network recently. As mentioned above,
there are two main defect patterns, single and mixed pattern. The current approaches
involve many different machine learning methods and algorithms, however, almost all of
4
Defect pattern Assignable cause Defect pattern Assignable cause
Machine handling
problem (Chen
and Liu, 2000;
Wang et al.,
2006)
Scrape error (Liu
and Chien, 2013)
Thin film de-
position process
(Wang et al.,
2006)
Mask error (Liu
and Chien, 2013)
Etching process
problem (Wang
et al., 2006; Yuan
et al., 2011)
Probe-pin error
(Liu and Chien,
2013)
Stepper malfunc-
tions and sawing
imperfections
(Kim et al., 2016)
Probe-card error
(Liu and Chien,
2013)
Test-spec. error
(Liu and Chien,
2013)
Process error (Liu
and Chien, 2013)
Table 1.1: Examples of root causes of defect patterns on wafer
the current researches focus on building a classification model to detect the single de-
fect patterns and they achieved a very high accuracy rate. Recently in December 2015,
Adly et al. (2015a) proposed Simplified Subspaced Regression Network that outperforms
other current methods with a very high classification accuracy of 99.884%. However,
there is still a gap research on identification of defect patterns in semiconductor wafer
maps. An identification of mixed defect pattern was not researched thoroughly due to
its complex patterns. Based on the current situation, this thesis focus on finding the
most appropriate machine learning technique to mitigate this research gap.
5
1.3 Proposal
The traditional machine learning approaches causes some significant problems. First,
using all of the parameters in the image input often increases the computational costs
and time to build a model. For example, an image with the size of 40×40 so the first
hidden layer should have 40×40=1600 weights. Clearly a huge amount of parameters
could lead to overfitting and high computational costs. Second, the traditional machine
learning approaches could not be invariant to shifts in the image inputs. That means
when defect pattern gets shifted several pixels to a certain direction, the traditional
machine learning technique may give a different result.
The recent research on deep learning model has contributed substantially on the com-
puter vision area, especially convolutional neural networks. The idea of CNNs is that
the hidden neurons are only connected with a small area of the previous layer rather
than all of them. These neurons will extract the important features of the image. After
that an ordinary neural network process these features to classify the input into some
predefined categories. However, there is currently no research on learning deep struc-
tured network for identification of mixed patterns. Therefore, as an attempt to cover
the lack of mixed defect pattern research, the scope of this thesis focuses on building a
deep learning structured model to classify the mixed pattern of wafer map (Figure 1.5).
In summary, the main purpose of the thesis is threefold: (1) to conduct a thorough
literature review of wafer’s defect pattern recognition and point out the gap of research
in identification of defect patterns in semiconductor wafer maps, (2) to design a deep
learning structured network that can effectively identify the mixed defect pattern in
semiconductor wafers. The main deep net employed in the thesis is CNNs, and (3)
to prove the superiority of deep learning model over the traditional machine learning
approaches in terms of mixed defect pattern classification.
Many experiments were done to validate the CNNs and it was proved that the CNNs is
superior to other traditional machine learning network, in terms of classification accuracy
and coefficient of determination, achieving 79.626 % and 74.224% respectively.
6
1.4 Structure of The Thesis
• Chapter 2: Identification of Mixed defect Pattern Classification In
Semiconductor Wafer Maps Chapter 2 starts with the overview of each step in
the process of defect pattern classification on the wafer. It then briefly summarises
the current trend of what recent research focused on.
• Chapter 3: Learning Deep Structured Convolutional Neural Network
Chapter 3 provides a detailed discussion of CNNs. A thorough explanation of how
to find the best parameters in CNNs will be discussed. After that, it focuses on the
performance of CNNs and compares its performance with other shallow networks.
• Chapter 4: Conclusion and Future Work The final chapter concludes the
thesis with the summary of the results, followed by suggestions for further devel-
opment and recommendations.
7
Chapter 2
Identification of Mixed Defect
Pattern Classification in
Semiconductor Wafer Maps
Recent years have been witnessed a rapid emergence of a number of research on wafer
defect pattern classification. Chapter 2 introduces the overall strategy that researchers
normally used, followed by a brief summary of which machine learning techniques they
researched on, and what they have done in the last few years.
The process of defect pattern classification could be summarised in Figure 2.1 regarding
to the research of Lee and Kim (2015) and Yum et al. (2012).
• Step 1: Data collection/generation The wafer dataset is generated from two
main sources: real data, and self-generated data. Because of the expensive ac-
quisition cost of the real data from manufacturing companies, self-generated data,
based on the known wafer defect pattern, are widely used in the area of computer
science.
• Step 2: De-noising As mentioned in the first chapter, a normal wafer contains
some random defect patterns and local defect patterns. Global defect pattern
affects the accuracy of the classification and makes the computation more complex.
Therefore, before training the network, removing the random defect pattern from
local one is a necessary pre-processing step to increase the classification rate.
8
Figure 2.1: Process of defect pattern analysis
• Step 3: Feature generation Two main steps in step 3 are feature extraction
and feature/attribute selection. In order to support the further steps, some crit-
ical features are extracted. These feature vectors act as inputs of defect pattern
detection and defect pattern recognition.
• Step 4: Defect detection Automatic defect detection contains spatial random-
ness test which tests the dependence between data points. In this stage, the output
of defect detection is whether the defect pattern is normal or not. If the defect
pattern is abnormal or contains local defective chips, the wafer will process in step
5.
9
• Step 5: Defect classification With the feature vectors in step 3, abnormal
defect pattern is classified into predefined categories, i.e. ring, spot, curvilinear
pattern, etc, using various machine learning techniques and algorithms.
• Step 6: Root cause analysis After knowing the defect pattern, other machine
learning techniques are used for root cause analysis to identify which processes or
machines failed.
In this section, we first describe defect pattern generation, which is the first step of the
classification process. After understanding how data is collected, we discuss more details
of the main methodologies using in each stage of defect pattern detection like spatial
randomness test, de-noising process, feature extraction, etc.
2.1 Defect Pattern
Many authors used different type of wafer maps