Lecture 8: Spatial Interpolation (IDW)
GGR376
Dr. Adams
Interpolation (General)
What is it?
A method of constructing new data points within the range of a discrete set of known data points.
Example – Maple Sap Collection
Example – Daily Sap Collection
Create a plot of Day vs Sap Collected
Day Total Sap Collected (Gallons)
00 14
2 NA 3 12 4 16 6 20 7 NA 8 28
How much sap should have been collected by Day 2 and Day 7?
Example – Sap Collected Syrup Produced
7
6
5
4
3
2
1
0
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
Sap
How much sap is required for 3 litres of syrup?
Syrup
Linear Interpolation
Linear Interpolation Equation
Given two known coordinates: (x0, y0) and (x1, y1)
y = y0(x1 −x)+y1(x −x0) x1 − x0
Try
Given (3, 5) & (5, 10) What is x = 4
Liner Interpolation Recap
One dimension
Linear approach
Within the range of the data
Outside the range
Extrapolation: Estimating beyond the original observation range.
IPCC: Projections of Future Changes in Climate
What about space?
“Everything is related to everything else, but near things are more related than distant things.” – Waldo Tobler
Spatial Interpolation
Prediction of values at unknown locations
Rainfall
Air Pollution
Ground Water Depth Elevation
Paris, France – Air Pollution Monitors
Paris – Interpolation vs. Extrapolation
Paris – Prediction Points
Paris – Regular Grid of Points
Paris – Regular Grid (Raster Output)
Spatial Interpolation
Global vs local
Global interpolation takes into account all values Local utilizes a moving window approach
Exact vs approximate
Exact, value at a known location will remain
Approximate, values at a know location may change
Paris – Pollution Values
Thiessen (Voronoi) polygons
Individual areas of influence around each point of a set.
Boundaries define the area that is nearest to each point relative
to all other points.
Mathematically defined by the perpendicular bisectors of the
lines between all points.
Paris Thiessen Polygons
Thiessen Polygons
A very simplified interpolation technique Does not take into account:
surrounding values
more than one observation
Spatially Weighted Average
Inverse Distance Weighted Interpolation
Exact Interpolator
Known locations retain their exact values
n z(x ) · d−k i=1 i i0
zˆ(x0)= n d−k ifdi0 ̸=0foralli i=1 i0
where, di0 ̸= 0 for some i,zˆ(x0) = xi our prediction zˆ(x0) is based on a specified number of neighbours n, i = 1, …, n
IDW Equation (Left of equals)
zˆ ( x 0 ) This is our predication for location x0
IDW Equation (Bottom of Fraction)
n d−k
i0 i=1
n is the number of neighbours that will be considered di0 the distance for each i to the prediction location −k is the exponent of distance
Often set to 2
Reduces the influence of points as they are further away
The sum of the distance measure (to the power of negative k) standardizes the function, as the total sum of distances will vary.
IDW Equation (Top of Fraction)
n z(xi)·d−k
i=1
Summing the product (·) of the surrounding values and their spatial
weight (d−k) i0
i0
IDW Equation (After Fraction)
ifdi0 ̸=0foralli
We will only conduct this calculation for prediction locations that do not match existing values
Exact interpolator
Selecting n and k
n, number of neighbours that we will use in the calculation Yes, we can use all neighbours
Dependent on number of cases
If you vary the n and the outcome is the same, it is less critical
k, effects the weighting of distance
Smaller k, weights are higher for further points Usually begin with k = 2
Validate choices with cross-validation
Paris IDW Output
Cross-Validation
Types
Exhaustive
Test all possible combinations to divide the original sample into
a training and a validation set
Non-exhaustive
Do not compute all ways of splitting the original sample.
Exhaustive Cross-Validation
Leave-one-out (LOOCV)
Remove a sample from the data
Fit the model
Compare estimated and actual value for the removed sample
Leave-p-out (LpOCV) Remove p samples
LOOCV requires n repetitions LpOCV requires many more
n! p!(n − p)!
LpOCV Repetitions (30 Samples)
30000000
20000000
10000000
0
2.5 5.0
7.5 10.0
P
Repetitions
Non-Exhaustive Cross-Validation
Holdout Method
Split the data into a testing and a training set
k-fold cross-validation
Original sample is randomly partitioned into k equal sized
subsamples
A subsample is retained as the validation data
Remaining k − 1 subsamples are used as training data
Repeated random sub-sampling validation (Monte Carlo) Randomly split the dataset into training and testing data
Repeat n times
Repeated version of the holdout method
Measures of fit
Cross-validation requires a measure to assess the error.
Mean squared error (MSE)
Root mean squared error (RMSE)
Root mean square deviation (RMSD) Mean absolute deviation (MAD)
Cross Validation Output
Predicted Values
Observed Values
Error
Error is observed minus predicted
Y1 = 2.2(Observed)
Yˆ = 2.3(Predicted) 1
ε = −0.1
Yˆ = 2.3, 3.3, 2.5…n
Y =2.2,3.1,1.7…n
MSE
1n
MSE= (Yi−Yˆi)2
Yˆ is a vector of n predictions
Y is a vector of observed values
Where,
n i=1
RMSE (RMSD)
ni = 1 ( Y i − Yˆ i ) 2 RMSE = n
MAD
1n
MAD= |Yi−Yˆi|
n i=1
Choosing a measure of fit
MSE and RMSE are sensitive to outliers (extreme values) Power of 2
RMSE has the same units as the data being estimated.
MAD is easily interpreted.
RMSE or MSE is useful if large errors are more important than small errors