CS计算机代考程序代写 Lecture 8: Spatial Interpolation (IDW)

Lecture 8: Spatial Interpolation (IDW)
GGR376
Dr. Adams

Interpolation (General)
What is it?
A method of constructing new data points within the range of a discrete set of known data points.

Example – Maple Sap Collection

Example – Daily Sap Collection
Create a plot of Day vs Sap Collected
Day Total Sap Collected (Gallons)
00 14
2 NA 3 12 4 16 6 20 7 NA 8 28
How much sap should have been collected by Day 2 and Day 7?

Example – Sap Collected Syrup Produced
7
6
5
4
3
2
1
0
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
Sap
How much sap is required for 3 litres of syrup?
Syrup

Linear Interpolation

Linear Interpolation Equation
Given two known coordinates: (x0, y0) and (x1, y1)
y = y0(x1 −x)+y1(x −x0) x1 − x0
Try
Given (3, 5) & (5, 10) What is x = 4

Liner Interpolation Recap
􏰀 One dimension
􏰀 Linear approach
􏰀 Within the range of the data

Outside the range
Extrapolation: Estimating beyond the original observation range.

IPCC: Projections of Future Changes in Climate

What about space?
“Everything is related to everything else, but near things are more related than distant things.” – Waldo Tobler

Spatial Interpolation
Prediction of values at unknown locations
􏰀 Rainfall
􏰀 Air Pollution
􏰀 Ground Water Depth 􏰀 Elevation

Paris, France – Air Pollution Monitors

Paris – Interpolation vs. Extrapolation

Paris – Prediction Points

Paris – Regular Grid of Points

Paris – Regular Grid (Raster Output)

Spatial Interpolation
􏰀 Global vs local
􏰀 Global interpolation takes into account all values 􏰀 Local utilizes a moving window approach
􏰀 Exact vs approximate
􏰀 Exact, value at a known location will remain
􏰀 Approximate, values at a know location may change

Paris – Pollution Values

Thiessen (Voronoi) polygons
􏰀 Individual areas of influence around each point of a set.
􏰀 Boundaries define the area that is nearest to each point relative
to all other points.
􏰀 Mathematically defined by the perpendicular bisectors of the
lines between all points.

Paris Thiessen Polygons

Thiessen Polygons
􏰀 A very simplified interpolation technique 􏰀 Does not take into account:
􏰀 surrounding values
􏰀 more than one observation

Spatially Weighted Average

Inverse Distance Weighted Interpolation
􏰀 Exact Interpolator
􏰀 Known locations retain their exact values
􏰁n z(x ) · d−k i=1 i i0
zˆ(x0)= 􏰁n d−k ifdi0 ̸=0foralli i=1 i0
where, di0 ̸= 0 for some i,zˆ(x0) = xi our prediction zˆ(x0) is based on a specified number of neighbours n, i = 1, …, n

IDW Equation (Left of equals)
zˆ ( x 0 ) This is our predication for location x0

IDW Equation (Bottom of Fraction)
n 􏰂d−k
i0 i=1
􏰀 n is the number of neighbours that will be considered 􏰀 di0 the distance for each i to the prediction location 􏰀 −k is the exponent of distance
􏰀 Often set to 2
􏰀 Reduces the influence of points as they are further away
The sum of the distance measure (to the power of negative k) standardizes the function, as the total sum of distances will vary.

IDW Equation (Top of Fraction)
n 􏰂z(xi)·d−k
i=1
Summing the product (·) of the surrounding values and their spatial
weight (d−k) i0
i0

IDW Equation (After Fraction)
ifdi0 ̸=0foralli
􏰀 We will only conduct this calculation for prediction locations that do not match existing values
􏰀 Exact interpolator

Selecting n and k
􏰀 n, number of neighbours that we will use in the calculation 􏰀 Yes, we can use all neighbours
􏰀 Dependent on number of cases
􏰀 If you vary the n and the outcome is the same, it is less critical
􏰀 k, effects the weighting of distance
􏰀 Smaller k, weights are higher for further points 􏰀 Usually begin with k = 2
􏰀 Validate choices with cross-validation

Paris IDW Output

Cross-Validation
Types
􏰀 Exhaustive
􏰀 Test all possible combinations to divide the original sample into
a training and a validation set
􏰀 Non-exhaustive
􏰀 Do not compute all ways of splitting the original sample.

Exhaustive Cross-Validation
􏰀 Leave-one-out (LOOCV)
􏰀 Remove a sample from the data
􏰀 Fit the model
􏰀 Compare estimated and actual value for the removed sample
􏰀 Leave-p-out (LpOCV) 􏰀 Remove p samples
􏰀 LOOCV requires n repetitions 􏰀 LpOCV requires many more
n! p!(n − p)!

LpOCV Repetitions (30 Samples)
30000000
20000000
10000000
0
2.5 5.0
7.5 10.0
P
Repetitions

Non-Exhaustive Cross-Validation
􏰀 Holdout Method
􏰀 Split the data into a testing and a training set
􏰀 k-fold cross-validation
􏰀 Original sample is randomly partitioned into k equal sized
subsamples
􏰀 A subsample is retained as the validation data
􏰀 Remaining k − 1 subsamples are used as training data
􏰀 Repeated random sub-sampling validation (Monte Carlo) 􏰀 Randomly split the dataset into training and testing data
􏰀 Repeat n times
􏰀 Repeated version of the holdout method

Measures of fit
Cross-validation requires a measure to assess the error.
􏰀 Mean squared error (MSE)
􏰀 Root mean squared error (RMSE)
􏰀 Root mean square deviation (RMSD) 􏰀 Mean absolute deviation (MAD)

Cross Validation Output
Predicted Values
Observed Values
Error
Error is observed minus predicted
Y1 = 2.2(Observed)
Yˆ = 2.3(Predicted) 1
ε = −0.1
Yˆ = 2.3, 3.3, 2.5…n
Y =2.2,3.1,1.7…n

MSE
1n
MSE= 􏰂(Yi−Yˆi)2
Yˆ is a vector of n predictions
Y is a vector of observed values
Where,
n i=1

RMSE (RMSD)
􏰆 􏰁 ni = 1 ( Y i − Yˆ i ) 2 RMSE = n

MAD
1n
MAD= 􏰂|Yi−Yˆi|
n i=1

Choosing a measure of fit
􏰀 MSE and RMSE are sensitive to outliers (extreme values) 􏰀 Power of 2
􏰀 RMSE has the same units as the data being estimated.
􏰀 MAD is easily interpreted.
􏰀 RMSE or MSE is useful if large errors are more important than small errors