程序代写代做代考 Excel chain algorithm matlab Metamodeling

Metamodeling

Sampling & Surrogate
Modeling

1

ME 564/SYS 564
Wed. Sep. 19th 2018
Brian Chell

Goal of Week 4: To learn different strategies to sample your design space and
build metamodels, and to explore some MATLAB tools that can help

Introduction

• Brian Chell
• bchell@stevens.edu

• Ph.D. Student of Prof. Hoffenson

• Research based on Multidisciplinary Analysis and
Optimization (MDAO)

• Sampling & surrogate modeling an important part
of MDAO

2

Recap: How to optimize

1. Formulate the problem
a) Define system boundaries

b) Develop analytical models

c) Explore/reduce the problem space

d) Formalize optimization problem

2. Solve the problem
a) Choose the right approach/algorithm

b) Solve (by hand, code, or software)

c) Interpret the results

d) Iterate if needed

3

𝐱𝑘+1 = 𝐱𝑘 − 𝐇(𝐱𝑘)
−1𝛁𝑓 𝐱0

(Weeks 1-2, 4, 9-12)

(Weeks 3, 5-8, 13)

TODAY

Recap: Optimization formulation

4

Variables

Objective
function

Parameters

Constraints

“negative null” form

Today we will discuss
strategies for dealing
with difficult functions
f, g, and h

What is surrogate modeling?

Fitting a function to some data

The data points represent the model

The function we fit is a meta- or “surrogate” model—
a model of a model!

input

o
u

tp
u

t

5

Why do we do it?

• When we have only physical
experiments or observations to
collect data (no computational
models exist)

• When computational models are
expensive to evaluate
(usually time-intensive)

6

Surrogate models are generally much faster than
simulations, and many optimization algorithms
require really large numbers of simulations
(However, we often trade accuracy for speed)

Example: Quadcopter Motor Mass

7

Example: Gather data

8

Example: Gather data

9

Example: Choose and fit function

10

0

50

100

150

200

250

300

0 10000 20000 30000 40000 50000 60000 70000 80000 90000

M
a

ss
(

g
)

Volume (mm^3)

Motor Volume vs. Mass

mass = 0.0032*Vol + 0.2874

g1(r, h): n*(0.0032*(πr
2h) + 0.2874) ≤ 150

Surrogate modeling steps

1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run

2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.

3. Fit a function to data
Find best-fit parameters for function structure

4. Assess fitness
Measure how well the surrogate model fits the data

11

Surrogate modeling steps

1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run

2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.

3. Fit a function to data
Find best-fit parameters for function structure

4. Assess fitness
Measure how well the surrogate model fits the data

12

Monte Carlo approach

Choose random experiments/points to sample

13
𝑥1

𝑥2

𝑥2,𝑚𝑎𝑥

𝑥2,𝑚𝑖𝑛

𝑥1,𝑚𝑖𝑛 𝑥1,𝑚𝑎𝑥

There are more
efficient ways
to do this!

Gathering data

Design of Experiments (DOE): Collect a useful set of
data that efficiently spans the design/input space

14

DOE: Full factorial

15

For d dimensions/variables
and n levels for each variable

This samples the space
uniformly, but can get
very expensive

DOE: Latin hypercube

16

There are many solutions
to this sampling problem

Divide each variable into n
levels, and then randomly
distribute points such that
each row and column has
one point

DOE: Optimal Latin hypercube

Maximize the minimum distance
between any two points

17

DOE: MATLAB codes

Design of Experiments (DOE): Collect a useful set of
data that efficiently spans the design/input space

Latin Hypercube/Optimal Latin HypercubeFull Factorial

design = fullfact([4,7])
for 2 variables: 4 levels on the first
and 7 on the second

design =
lhsdesign(n,p,‘iterations’,20,‘criterion’,‘maximin’)
for n samples and p variables

18

Adaptive Sampling

Instead of deciding the sampling points at the
beginning, you can choose them on the way.

• Assume that you want to find minimum of 𝑦 = 𝑓(𝐱)

• Start with an initial set of points {𝐱 1 , 𝐱 2 , … 𝐱 𝑘 }

• Fit a a kriging model to those points

• Find the point that maximizes the expected improvement
using a merit function.

• Sample at this point and continue …

Efficient Global Optimization (EGO)

We will discuss this again in Week 12

Surrogate modeling steps

1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run

2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.

3. Fit a function to data
Find best-fit parameters for function structure

4. Assess fitness
Measure how well the surrogate model fits the data

20

Some common surrogate model structures

• Interpolation

• Regression

• Artificial neural networks (ANNs)

• Kriging

21

Interpolation

yi = interp1(x,y,xi,‘spline’) – supply input and output
data (x,y), choose among methods like linear
or spline, supply new inputs to be estimated (xi)

linear cubic spline
22

Use data points, and estimate in between them

Linear regression

• Finding parameters (𝜷) to make a function of 𝐱 best
match the data points of 𝐲

• Linear model:

• Nonlinear model:

23

Design Space
x(1), x(2), … , x(k)

Output Space
y(1), y(2), … , y(k)

𝑓 𝐱 = 𝜷𝑻𝐱

𝑓 𝐱 = 𝜷𝑻𝜑(𝐱)

A good model should closely match 𝑓 𝐱 to 𝐲

data

Linear Regression

𝑦

𝑥1

𝑥𝑛

(𝐱 1 , 𝒚 1 )

(𝐱 𝑘 , 𝒚 𝑘 )

Error

𝐸 𝜷 =

𝑖=1

𝑘
1

2
𝑓 𝐱 𝒊 , 𝜷 − 𝑦 𝑖

2

Minimize the square error:

Linear Regression

Modeling becomes an optimization problem

𝐸 𝜷 =

𝑖=1

𝑘
1

2
𝑓 𝐱 𝒊 , 𝜷 − 𝑦 𝑖

2

𝐸 𝜷 =

𝑖=1

𝑘
1

2
𝜷𝑇𝐱(𝑖) − 𝑦 𝑖

2

Design variable

𝜕𝐸 𝜷

𝜕𝜷
=

𝐱 1 𝜷 − 𝑦 1


𝐱 𝑘 𝜷 − 𝑦 𝑘

𝑇

= 𝟎𝑇

We will discuss this in week 5. This is
the first order necessary condition:
the derivative equals zero

Linear Regression

Modeling becomes an optimization problem with a
closed-form solution

𝜕𝐸 𝜷

𝜕𝜷
=

𝐱 1 𝜷 − 𝑦 1


𝐱 𝑘 𝜷 − 𝑦 𝑘

𝑇

= 𝟎𝑇

𝜷∗ = 𝑿
𝑇𝑿 −𝟏𝑿𝑇𝒚

𝑿 =

1 𝑥1
(1)

… 𝑥𝑛
(1)

⋮ ⋱ ⋮

1 𝑥1
(𝑘)

… 𝑥𝑛
(𝑘)

𝒚 =
𝑦(1)


𝑦(𝑘)

Linear coefficients for
least square error

𝜷 =
𝛽0

𝛽𝑛

Example
Find a linear model to predict the exam grade given HW grade.

Student HW Grade Exam Grade

1 95 85

2 85 95

3 80 70

4 70 65

5 60 70

A linear model of 1 variable has 2 coefficients:

𝑥(𝑖) 𝑦(𝑖)

𝛽0, 𝛽1

𝑓 𝒙 = 𝛽0 + 𝛽1𝑥

Recall: 𝜷∗ = 𝑿
𝑇𝑿 −𝟏𝑿𝑇𝒚

Example

HW Grade Exam Grade

95 85

85 95

80 70

70 65

60 70

𝑥(𝑖) 𝑦(𝑖)

𝛽0, 𝛽1 =?

𝜷∗ = 𝑿
𝑇𝑿 −𝟏𝑿𝑇𝒚

𝑿 =

1 95
1
1

85
80

1
1

70
60

𝒚 =

85
95
70
65
70

𝜷 =
𝛽0
𝛽1

=
26.78
0.64

𝐸𝑥𝑎𝑚 𝐺𝑟𝑎𝑑𝑒 = 26.78 + 0.64(𝐻𝑊 𝐺𝑟𝑎𝑑𝑒)

Nonlinear models

You can do linear regression with more complex
models: just reform your x matrix to have the
appropriate columns:

𝐱 = 𝑥1
2 𝑥1 𝑥1𝑥2 𝑥2

2 𝑥2 sin(𝑥1)

𝑦 = 𝛽1𝑥1
2 + 𝛽2𝑥1 + 𝛽3𝑥1𝑥2 + 𝛽4𝑥2

2 + 𝛽5𝑥2 + 𝛽6sin(𝑥1)

29

Regression in MATLAB

Use the backslash operator \ for linear regression:

With input data matrix x and output vector y,
command:

beta = x\y

will output a list of beta values such that:

𝑦 = 𝛽1𝑥1 + 𝛽2𝑥2 + 𝛽3𝑥3 + ⋯

You can also use cftool to open a graphical user
interface for the curve-fitting toolbox

30

Artificial Neural Networks (ANNs)

When you don’t know the structure of your function (e.g.,
which terms to include), ANNs offer an automated way

31

𝑥1

𝑥2

𝑥3

𝑦

Neuron Inputs

Neuron Output

𝑥1

𝑦1,1

𝑦2,1
𝑦2,2

𝑦2,3

𝑦2,4

𝑦𝑜𝑢𝑡

Each neuron has a
transfer function
𝑓𝑖,𝑗 𝐱 = 𝑦𝑖,𝑗

A chain of these functions
gives the final model

Neural Network Model

𝑥1

𝑦1,1

𝑦2,1
𝑦2,2

𝑦2,3

𝑦2,4

𝑦𝑜𝑢𝑡

Hidden
Layer

Output
Layer

𝑦1,1 = 1 + 𝑒
𝑏1,1−𝑤1,1𝑥

−1

𝑦2,1 = 1 + 𝑒
𝑏2,1−𝑤2,1𝑦1,1

−1

𝑦2,4 = 1 + 𝑒
𝑏2,4−𝑤2,4𝑦1,1

−1

… Hidden layer function

b
w

Transfer Function
(logistic)

0 ≤ 𝑦2,𝑖 ≤ 1

Neural Network Model
Final output

𝑦𝑜𝑢𝑡 = 𝑏3,1 −

𝑖=1

4

𝑤2,𝑖𝑦2,𝑖

𝑥1

𝑦1,1

𝑦2,1
𝑦2,2

𝑦2,3

𝑦2,4

𝑦𝑜𝑢𝑡

Hidden
Layer

Output
Layer

Can take any desired value
based on bias and weight
values

Find the set of 𝑏𝑖,𝑗 and 𝑤𝑖,𝑗 to minimize the square error

𝐸𝑟𝑟𝑜𝑟 𝑏𝑖,𝑗 , 𝑤𝑖,𝑗 =

𝑖=1

𝑘

𝑦 𝑖 − 𝑦𝑜𝑢𝑡 𝐱
𝑖

2

Neural Networks with MATLAB
MATLAB has an easy-to-use neural network toolbox: nftool

Example: Linear regression vs. ANN

Least-squares linear
regression w/ second-

order terms

Artificial Neural Network
w/ radial basis functions

(exact fit) 35

64-point, 2-variable full
factorial sample (?)

Kriging

Typical surrogate models have a prediction error, 𝜀,
which is assumed to be independent:

𝑦 𝑥 = 𝑓 𝑥 + 𝜀

In kriging, 𝜀 is correlated with 𝑥:

𝑦 𝑥 = 𝑓 𝑥 + 𝑍(𝑥)

36

Non-exact-fitting function
(e.g., linear, polynomial)

Independent,
identically distributed
(i.i.d.) random error

Non-exact-fitting function
(e.g., linear, polynomial)

Exact-fit error element
that is a function of 𝒙

Kriging

• There are available software packages (e.g., ModelCenter)
or public MATLAB codes for kriging implementation

https://www.mathworks.com/matlabcentral/fileexchange/?term=kriging

• Kriging goes through data points exactly

• Good for data from deterministic simulation models

• Bad for data with measurement error or noise

37

https://www.mathworks.com/matlabcentral/fileexchange/?term=kriging

Surrogate modeling steps

1. Gather data
Sample the design space by efficiently choosing which
“experiments” to run

2. Choose a function structure
E.g., linear regression, terms in a regression model,
kriging, neural networks, etc.

3. Fit a function to data
Find best-fit parameters for function structure

4. Assess fitness
Measure how well the surrogate model fits the data

38

Visualizing error

• With 3 or more variables, visualization is difficult
• Can project to two dimensions
• Can use eigenvectors to reduce dimensionality

• Fitness can be mathematically calculated
• Standard error
• Mean square error
• Cross-validation

39
Left image: https://www.datadvance.net/product/macros/manual/6.5sp1/_images/initial_model_approximation.png

Right image: Samad, A., Choi, J. H., & Kim, K. Y. (2008). Blade Optimization of a Transonic Compressor Using a Multiple Surrogate Model.
Transactions of the Korean Society of Mechanical Engineers B, 32(4), 317-326.

Fitness: R2

Coefficient of determination

𝑅2 = 1 −
𝑖=1

𝑛 𝑦𝑖− 𝑦𝑖
2

𝑖=1
𝑛 𝑦𝑖− 𝑦

2

This is a measure of how well the model 𝑦𝑖 fits
compared to simply using the average value 𝑦.

Adjusted R2 accounts for model complexity

𝑅2𝑎𝑑𝑗 = 1 − 1 − 𝑅
2 𝑛−1

𝑛−𝑝−1

where n is the number of data points and p is the
number of parameters in the model

This is a common evaluation metric with regression models 40

Fitness: Cross-validation

1. Divide the data set into n subsets

2. Fit a model using all but the ith subset

3. Use the ith subset to test fitness

4. Repeat n times, average the results

Fitness often measured in terms of mean square
error:

𝑀𝑆𝐸 =
1

𝑛
𝑖=1

𝑛 𝑦𝑖 − 𝑦𝑖
2

This is a common evaluation metric with exact-fit models

41

Some metamodeling considerations

• Outliers

• Underfitting

• Overfitting

• Training vs validation data

• Regularization

42

Outliers

Model w/o outliers

Model w/ outliers

x

f(x)

• Outliers might have a big
impact on the model

• Depending on the problem
you might want to remove
them (if possible)

Underfitting

f(x)

x

Underlying
Function

Badly sampled data
(very few data points)

Sampling size should be sufficient to capture the function behavior

Function with
minimum error

Overfitting

f(x)

x

Underlying
Function

Data with noise
(measurement error)

Fitted function
with zero error

• Overfitting captures the noise instead of the underlying function
• Unnecessary model complexity might also cause overfitting

Training vs Validation Data

• To overcome overfitting entire data set is separated
into training and validation data

• Model is developed for training data

Model complexity

Error

Training error

Validation error
Stop training

Regularization

Can we find an optimal complexity for our model?

𝐸𝑟𝑟𝑜𝑟 𝜷 =

𝑖=1

𝑘

(𝜷𝑇𝐱(𝑖) − 𝑦(𝑖))

where 𝐱 = 1, 𝑥, 𝑥2, 𝑥3 , 𝑥4 … , 𝑥𝑛 𝑇

Consider the following linear regression problem

Penalty for
complexity

+ 𝜆 𝜷

Increasing 𝜆 will force to find simpler models

Metamodeling Steps (for ANNs)

1) If you do not have data already, sample the space
using any sampling method, e.g., Latin hypercube

2) Plot data to see if outliers exist; remove if you can

3) Normalize your data

4) Separate your data into training and validation/test
(90% training, 10% validation is a good start)

5) Train your model using a metamodeling method,
e.g. neural networks

6) If error is not good enough, either add more data
or increase model complexity

Additional resources

Model fitting and regression in MATLAB (9-min video;
includes importing Excel data, plotting 2-d, examining
goodness of fit, using the “basic fitting” plot tool):

MATLAB Neural Network Toolbox:

http://www.mathworks.se/videos/getting-started-with-
neural-network-toolbox-68794.html

Kriging:

http://www2.imm.dtu.dk/~hbni/dace/

49

http://www.mathworks.se/videos/getting-started-with-neural-network-toolbox-68794.html
http://www2.imm.dtu.dk/~hbni/dace/

Summary

• Metamodeling is fitting a mathematical function to
your data to speed up evaluations and optimization

• This involves four general steps:
• Gather data (e.g., using a DOE)

• Choose a function structure (e.g., linear, polynomial,
kriging, ANN)

• Fit a function to the data

• Assess fitness

• Watch out for outliers, underfitting, and overfitting

50

Workshop task
Use the DeJong function from the Yang 2010 paper:
http://arxiv.org/pdf/1008.0549.pdf (if n-dimensional, choose n = 2)

1. Sample the function

a. Using full factorial sampling (e.g., 4 x 4)

b. Using an optimal Latin hypercube (e.g., 16 points)

2. Fit regression metamodels to each sample (try different function
structures and sample sizes) and measure the R2 value and MSE

3. Plot the surfaces to show which sampling method and which
metamodeling technique seems best

4. If you have time, also fit metamodels using neural nets or kriging
and measure the cross-validated MSE

5. Summarize your findings – at the end we will meet as a class and
hear from each of you: (1) which test functions you tried, (2)
which sampling methods and (3) metamodels you used, and (4)
what seemed to work

51

http://arxiv.org/pdf/1008.0549.pdf

Example: Buried explosive tests

52
Dynamic Effects
Laboratory

Example: Gather data

53

Example: Choose and fit function

• Strain gauges output pressure
plot

• First spike is peak pressure

• Plotted depth of burial (DoB)
vs. peak pressure

54

Peak Pressure vs. DoB

0

10,000

20,000

30,000

40,000

50,000

60,000

0 4 8 12 16 20 24

Unscaled DoB (in.)

A
v
g

.
P

e
a
k
P

re
s
s
u

re
(

p
s
i)

Prior Tests (.25″ d)

Test Peak Pressure

Avg Peak Pressure

Peak Pressure vs. DoB

0

10000

20000

30000

40000

50000

60000

0 4 8 12 16 20 24

Unscaled DoB (in.)

A
v
g

.
P

e
a
k
P

re
s
s
u

re
(

p
s
i)

Peak Pressure vs. DoB

0

10000

20000

30000

40000

50000

60000

0 4 8 12 16 20 24

Unscaled DoB (in.)

A
v
g

.
P

e
a
k
P

re
s
s
u

re
(

p
s
i)

Acknowledgements

• These slides are modified from Steven Hoffenson’s
originally compiled slides for this course in 2017

• Much of this material came from Chapter 2 of the
textbook, Principles of Optimal Design

• Some of the slides and examples came from Dr.
Emrah Bayrak, Dr. Alex Burnap, and Dr. Namwoo
Kang while they were at the University of Michigan

• Some of the slides and examples came from Dr.
Michael Kokkolaras at McGill University

55