Metamodeling
Sampling & Surrogate
Modeling
1
ME 564/SYS 564
Wed. Sep. 19th 2018
Brian Chell
Goal of Week 4: To learn different strategies to sample your design space and
build metamodels, and to explore some MATLAB tools that can help
Introduction
• Brian Chell
• bchell@stevens.edu
• Ph.D. Student of Prof. Hoffenson
• Research based on Multidisciplinary Analysis and
Optimization (MDAO)
• Sampling & surrogate modeling an important part
of MDAO
2
Recap: How to optimize
1. Formulate the problem
a) Define system boundaries
b) Develop analytical models
c) Explore/reduce the problem space
d) Formalize optimization problem
2. Solve the problem
a) Choose the right approach/algorithm
b) Solve (by hand, code, or software)
c) Interpret the results
d) Iterate if needed
3
𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱0
(Weeks 1-2, 4, 9-12)
(Weeks 3, 5-8, 13)
TODAY
Recap: Optimization formulation
4
Variables
Objective
function
Parameters
Constraints
“negative null” form
Today we will discuss
strategies for dealing
with difficult functions
f, g, and h
What is surrogate modeling?
Fitting a function to some data
The data points represent the model
The function we fit is a meta- or “surrogate” model—
a model of a model!
input
o
u
tp
u
t
5
Why do we do it?
• When we have only physical
experiments or observations to
collect data (no computational
models exist)
• When computational models are
expensive to evaluate
(usually time-intensive)
6
Surrogate models are generally much faster than
simulations, and many optimization algorithms
require really large numbers of simulations
(However, we often trade accuracy for speed)
Example: Quadcopter Motor Mass
7
Example: Gather data
8
Example: Gather data
9
Example: Choose and fit function
10
0
50
100
150
200
250
300
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
M
a
ss
(
g
)
Volume (mm^3)
Motor Volume vs. Mass
mass = 0.0032*Vol + 0.2874
g1(r, h): n*(0.0032*(πr
2h) + 0.2874) ≤ 150
Surrogate modeling steps
1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run
2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.
3. Fit a function to data
Find best-fit parameters for function structure
4. Assess fitness
Measure how well the surrogate model fits the data
11
Surrogate modeling steps
1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run
2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.
3. Fit a function to data
Find best-fit parameters for function structure
4. Assess fitness
Measure how well the surrogate model fits the data
12
Monte Carlo approach
Choose random experiments/points to sample
13
𝑥1
𝑥2
𝑥2,𝑚𝑎𝑥
𝑥2,𝑚𝑖𝑛
𝑥1,𝑚𝑖𝑛 𝑥1,𝑚𝑎𝑥
There are more
efficient ways
to do this!
Gathering data
Design of Experiments (DOE): Collect a useful set of
data that efficiently spans the design/input space
14
DOE: Full factorial
15
For d dimensions/variables
and n levels for each variable
This samples the space
uniformly, but can get
very expensive
DOE: Latin hypercube
16
There are many solutions
to this sampling problem
Divide each variable into n
levels, and then randomly
distribute points such that
each row and column has
one point
DOE: Optimal Latin hypercube
Maximize the minimum distance
between any two points
17
DOE: MATLAB codes
Design of Experiments (DOE): Collect a useful set of
data that efficiently spans the design/input space
Latin Hypercube/Optimal Latin HypercubeFull Factorial
design = fullfact([4,7])
for 2 variables: 4 levels on the first
and 7 on the second
design =
lhsdesign(n,p,‘iterations’,20,‘criterion’,‘maximin’)
for n samples and p variables
18
Adaptive Sampling
Instead of deciding the sampling points at the
beginning, you can choose them on the way.
• Assume that you want to find minimum of 𝑦 = 𝑓(𝐱)
• Start with an initial set of points {𝐱 1 , 𝐱 2 , … 𝐱 𝑘 }
• Fit a a kriging model to those points
• Find the point that maximizes the expected improvement
using a merit function.
• Sample at this point and continue …
Efficient Global Optimization (EGO)
We will discuss this again in Week 12
Surrogate modeling steps
1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run
2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.
3. Fit a function to data
Find best-fit parameters for function structure
4. Assess fitness
Measure how well the surrogate model fits the data
20
Some common surrogate model structures
• Interpolation
• Regression
• Artificial neural networks (ANNs)
• Kriging
21
Interpolation
yi = interp1(x,y,xi,‘spline’) – supply input and output
data (x,y), choose among methods like linear
or spline, supply new inputs to be estimated (xi)
linear cubic spline
22
Use data points, and estimate in between them
Linear regression
• Finding parameters (𝜷) to make a function of 𝐱 best
match the data points of 𝐲
• Linear model:
• Nonlinear model:
23
Design Space
x(1), x(2), … , x(k)
Output Space
y(1), y(2), … , y(k)
𝑓 𝐱 = 𝜷𝑻𝐱
𝑓 𝐱 = 𝜷𝑻𝜑(𝐱)
A good model should closely match 𝑓 𝐱 to 𝐲
data
Linear Regression
𝑦
𝑥1
𝑥𝑛
(𝐱 1 , 𝒚 1 )
(𝐱 𝑘 , 𝒚 𝑘 )
Error
𝐸 𝜷 =
𝑖=1
𝑘
1
2
𝑓 𝐱 𝒊 , 𝜷 − 𝑦 𝑖
2
Minimize the square error:
Linear Regression
Modeling becomes an optimization problem
𝐸 𝜷 =
𝑖=1
𝑘
1
2
𝑓 𝐱 𝒊 , 𝜷 − 𝑦 𝑖
2
𝐸 𝜷 =
𝑖=1
𝑘
1
2
𝜷𝑇𝐱(𝑖) − 𝑦 𝑖
2
Design variable
𝜕𝐸 𝜷
𝜕𝜷
=
𝐱 1 𝜷 − 𝑦 1
⋮
𝐱 𝑘 𝜷 − 𝑦 𝑘
𝑇
= 𝟎𝑇
We will discuss this in week 5. This is
the first order necessary condition:
the derivative equals zero
Linear Regression
Modeling becomes an optimization problem with a
closed-form solution
𝜕𝐸 𝜷
𝜕𝜷
=
𝐱 1 𝜷 − 𝑦 1
⋮
𝐱 𝑘 𝜷 − 𝑦 𝑘
𝑇
= 𝟎𝑇
𝜷∗ = 𝑿
𝑇𝑿 −𝟏𝑿𝑇𝒚
𝑿 =
1 𝑥1
(1)
… 𝑥𝑛
(1)
⋮ ⋱ ⋮
1 𝑥1
(𝑘)
… 𝑥𝑛
(𝑘)
𝒚 =
𝑦(1)
⋮
𝑦(𝑘)
Linear coefficients for
least square error
𝜷 =
𝛽0
⋮
𝛽𝑛
Example
Find a linear model to predict the exam grade given HW grade.
Student HW Grade Exam Grade
1 95 85
2 85 95
3 80 70
4 70 65
5 60 70
A linear model of 1 variable has 2 coefficients:
𝑥(𝑖) 𝑦(𝑖)
𝛽0, 𝛽1
𝑓 𝒙 = 𝛽0 + 𝛽1𝑥
Recall: 𝜷∗ = 𝑿
𝑇𝑿 −𝟏𝑿𝑇𝒚
Example
HW Grade Exam Grade
95 85
85 95
80 70
70 65
60 70
𝑥(𝑖) 𝑦(𝑖)
𝛽0, 𝛽1 =?
𝜷∗ = 𝑿
𝑇𝑿 −𝟏𝑿𝑇𝒚
𝑿 =
1 95
1
1
85
80
1
1
70
60
𝒚 =
85
95
70
65
70
𝜷 =
𝛽0
𝛽1
=
26.78
0.64
𝐸𝑥𝑎𝑚 𝐺𝑟𝑎𝑑𝑒 = 26.78 + 0.64(𝐻𝑊 𝐺𝑟𝑎𝑑𝑒)
Nonlinear models
You can do linear regression with more complex
models: just reform your x matrix to have the
appropriate columns:
𝐱 = 𝑥1
2 𝑥1 𝑥1𝑥2 𝑥2
2 𝑥2 sin(𝑥1)
𝑦 = 𝛽1𝑥1
2 + 𝛽2𝑥1 + 𝛽3𝑥1𝑥2 + 𝛽4𝑥2
2 + 𝛽5𝑥2 + 𝛽6sin(𝑥1)
29
Regression in MATLAB
Use the backslash operator \ for linear regression:
With input data matrix x and output vector y,
command:
beta = x\y
will output a list of beta values such that:
𝑦 = 𝛽1𝑥1 + 𝛽2𝑥2 + 𝛽3𝑥3 + ⋯
You can also use cftool to open a graphical user
interface for the curve-fitting toolbox
30
Artificial Neural Networks (ANNs)
When you don’t know the structure of your function (e.g.,
which terms to include), ANNs offer an automated way
31
𝑥1
𝑥2
𝑥3
𝑦
Neuron Inputs
Neuron Output
𝑥1
𝑦1,1
𝑦2,1
𝑦2,2
𝑦2,3
𝑦2,4
𝑦𝑜𝑢𝑡
Each neuron has a
transfer function
𝑓𝑖,𝑗 𝐱 = 𝑦𝑖,𝑗
A chain of these functions
gives the final model
Neural Network Model
𝑥1
𝑦1,1
𝑦2,1
𝑦2,2
𝑦2,3
𝑦2,4
𝑦𝑜𝑢𝑡
Hidden
Layer
Output
Layer
𝑦1,1 = 1 + 𝑒
𝑏1,1−𝑤1,1𝑥
−1
𝑦2,1 = 1 + 𝑒
𝑏2,1−𝑤2,1𝑦1,1
−1
𝑦2,4 = 1 + 𝑒
𝑏2,4−𝑤2,4𝑦1,1
−1
… Hidden layer function
b
w
Transfer Function
(logistic)
0 ≤ 𝑦2,𝑖 ≤ 1
Neural Network Model
Final output
𝑦𝑜𝑢𝑡 = 𝑏3,1 −
𝑖=1
4
𝑤2,𝑖𝑦2,𝑖
𝑥1
𝑦1,1
𝑦2,1
𝑦2,2
𝑦2,3
𝑦2,4
𝑦𝑜𝑢𝑡
Hidden
Layer
Output
Layer
Can take any desired value
based on bias and weight
values
Find the set of 𝑏𝑖,𝑗 and 𝑤𝑖,𝑗 to minimize the square error
𝐸𝑟𝑟𝑜𝑟 𝑏𝑖,𝑗 , 𝑤𝑖,𝑗 =
𝑖=1
𝑘
𝑦 𝑖 − 𝑦𝑜𝑢𝑡 𝐱
𝑖
2
Neural Networks with MATLAB
MATLAB has an easy-to-use neural network toolbox: nftool
Example: Linear regression vs. ANN
Least-squares linear
regression w/ second-
order terms
Artificial Neural Network
w/ radial basis functions
(exact fit) 35
64-point, 2-variable full
factorial sample (?)
Kriging
Typical surrogate models have a prediction error, 𝜀,
which is assumed to be independent:
𝑦 𝑥 = 𝑓 𝑥 + 𝜀
In kriging, 𝜀 is correlated with 𝑥:
𝑦 𝑥 = 𝑓 𝑥 + 𝑍(𝑥)
36
Non-exact-fitting function
(e.g., linear, polynomial)
Independent,
identically distributed
(i.i.d.) random error
Non-exact-fitting function
(e.g., linear, polynomial)
Exact-fit error element
that is a function of 𝒙
Kriging
• There are available software packages (e.g., ModelCenter)
or public MATLAB codes for kriging implementation
https://www.mathworks.com/matlabcentral/fileexchange/?term=kriging
• Kriging goes through data points exactly
• Good for data from deterministic simulation models
• Bad for data with measurement error or noise
37
https://www.mathworks.com/matlabcentral/fileexchange/?term=kriging
Surrogate modeling steps
1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run
2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.
3. Fit a function to data
Find best-fit parameters for function structure
4. Assess fitness
Measure how well the surrogate model fits the data
38
Visualizing error
• With 3 or more variables, visualization is difficult
• Can project to two dimensions
• Can use eigenvectors to reduce dimensionality
• Fitness can be mathematically calculated
• Standard error
• Mean square error
• Cross-validation
39
Left image: https://www.datadvance.net/product/macros/manual/6.5sp1/_images/initial_model_approximation.png
Right image: Samad, A., Choi, J. H., & Kim, K. Y. (2008). Blade Optimization of a Transonic Compressor Using a Multiple Surrogate Model.
Transactions of the Korean Society of Mechanical Engineers B, 32(4), 317-326.
Fitness: R2
Coefficient of determination
𝑅2 = 1 −
𝑖=1
𝑛 𝑦𝑖− 𝑦𝑖
2
𝑖=1
𝑛 𝑦𝑖− 𝑦
2
This is a measure of how well the model 𝑦𝑖 fits
compared to simply using the average value 𝑦.
Adjusted R2 accounts for model complexity
𝑅2𝑎𝑑𝑗 = 1 − 1 − 𝑅
2 𝑛−1
𝑛−𝑝−1
where n is the number of data points and p is the
number of parameters in the model
This is a common evaluation metric with regression models 40
Fitness: Cross-validation
1. Divide the data set into n subsets
2. Fit a model using all but the ith subset
3. Use the ith subset to test fitness
4. Repeat n times, average the results
Fitness often measured in terms of mean square
error:
𝑀𝑆𝐸 =
1
𝑛
𝑖=1
𝑛 𝑦𝑖 − 𝑦𝑖
2
This is a common evaluation metric with exact-fit models
41
Some metamodeling considerations
• Outliers
• Underfitting
• Overfitting
• Training vs validation data
• Regularization
42
Outliers
Model w/o outliers
Model w/ outliers
x
f(x)
• Outliers might have a big
impact on the model
• Depending on the problem
you might want to remove
them (if possible)
Underfitting
f(x)
x
Underlying
Function
Badly sampled data
(very few data points)
Sampling size should be sufficient to capture the function behavior
Function with
minimum error
Overfitting
f(x)
x
Underlying
Function
Data with noise
(measurement error)
Fitted function
with zero error
• Overfitting captures the noise instead of the underlying function
• Unnecessary model complexity might also cause overfitting
Training vs Validation Data
• To overcome overfitting entire data set is separated
into training and validation data
• Model is developed for training data
Model complexity
Error
Training error
Validation error
Stop training
Regularization
Can we find an optimal complexity for our model?
𝐸𝑟𝑟𝑜𝑟 𝜷 =
𝑖=1
𝑘
(𝜷𝑇𝐱(𝑖) − 𝑦(𝑖))
where 𝐱 = 1, 𝑥, 𝑥2, 𝑥3 , 𝑥4 … , 𝑥𝑛 𝑇
Consider the following linear regression problem
Penalty for
complexity
+ 𝜆 𝜷
Increasing 𝜆 will force to find simpler models
Metamodeling Steps (for ANNs)
1) If you do not have data already, sample the space
using any sampling method, e.g., Latin hypercube
2) Plot data to see if outliers exist; remove if you can
3) Normalize your data
4) Separate your data into training and validation/test
(90% training, 10% validation is a good start)
5) Train your model using a metamodeling method,
e.g. neural networks
6) If error is not good enough, either add more data
or increase model complexity
Additional resources
Model fitting and regression in MATLAB (9-min video;
includes importing Excel data, plotting 2-d, examining
goodness of fit, using the “basic fitting” plot tool):
MATLAB Neural Network Toolbox:
http://www.mathworks.se/videos/getting-started-with-
neural-network-toolbox-68794.html
Kriging:
http://www2.imm.dtu.dk/~hbni/dace/
49
http://www.mathworks.se/videos/getting-started-with-neural-network-toolbox-68794.html
http://www2.imm.dtu.dk/~hbni/dace/
Summary
• Metamodeling is fitting a mathematical function to
your data to speed up evaluations and optimization
• This involves four general steps:
• Gather data (e.g., using a DOE)
• Choose a function structure (e.g., linear, polynomial,
kriging, ANN)
• Fit a function to the data
• Assess fitness
• Watch out for outliers, underfitting, and overfitting
50
Workshop task
Use the DeJong function from the Yang 2010 paper:
http://arxiv.org/pdf/1008.0549.pdf (if n-dimensional, choose n = 2)
1. Sample the function
a. Using full factorial sampling (e.g., 4 x 4)
b. Using an optimal Latin hypercube (e.g., 16 points)
2. Fit regression metamodels to each sample (try different function
structures and sample sizes) and measure the R2 value and MSE
3. Plot the surfaces to show which sampling method and which
metamodeling technique seems best
4. If you have time, also fit metamodels using neural nets or kriging
and measure the cross-validated MSE
5. Summarize your findings – at the end we will meet as a class and
hear from each of you: (1) which test functions you tried, (2)
which sampling methods and (3) metamodels you used, and (4)
what seemed to work
51
http://arxiv.org/pdf/1008.0549.pdf
Example: Buried explosive tests
52
Dynamic Effects
Laboratory
Example: Gather data
53
Example: Choose and fit function
• Strain gauges output pressure
plot
• First spike is peak pressure
• Plotted depth of burial (DoB)
vs. peak pressure
54
Peak Pressure vs. DoB
0
10,000
20,000
30,000
40,000
50,000
60,000
0 4 8 12 16 20 24
Unscaled DoB (in.)
A
v
g
.
P
e
a
k
P
re
s
s
u
re
(
p
s
i)
Prior Tests (.25″ d)
Test Peak Pressure
Avg Peak Pressure
Peak Pressure vs. DoB
0
10000
20000
30000
40000
50000
60000
0 4 8 12 16 20 24
Unscaled DoB (in.)
A
v
g
.
P
e
a
k
P
re
s
s
u
re
(
p
s
i)
Peak Pressure vs. DoB
0
10000
20000
30000
40000
50000
60000
0 4 8 12 16 20 24
Unscaled DoB (in.)
A
v
g
.
P
e
a
k
P
re
s
s
u
re
(
p
s
i)
Acknowledgements
• These slides are modified from Steven Hoffenson’s
originally compiled slides for this course in 2017
• Much of this material came from Chapter 2 of the
textbook, Principles of Optimal Design
• Some of the slides and examples came from Dr.
Emrah Bayrak, Dr. Alex Burnap, and Dr. Namwoo
Kang while they were at the University of Michigan
• Some of the slides and examples came from Dr.
Michael Kokkolaras at McGill University
55