Lecture 9: Kriging
Dr. Adams
Setting the projection
How to set the projection, when it is missing. Define Projection (ArcGIS)
spObject@proj4string <- CRS("+proj=utm +zone=17 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
Projections using EPSG SRID
EPSG
Spatial Reference System Identifier
Maintained by the International Association of Oil & Gas
Producers (OGP) Surveying & Positioning Committee
spObject@proj4string <- CRS("+init=epsg:26917") http://spatialreference.org/ref/epsg/
IDW Affect of k
1e−02
1e−03
1e−04
1e−05
1e−06
25 50 75 100
Distance
k
1.5
2 2.5 3
Weight Log10
Deterministic Methods
Where outcomes are precisely determined through known relationships among states and events
No random variation
Given inputs produce the same outputs No uncertainty assumed
Stochastic Methods
A stochastic model will include some uncertainty in the estimate. Kriging Formula (Example)
Z(x) = μ(s) + ε(s)
Where,
Z(x) is the variable of interest μ(s) is a deterministic trend
ε(s) are the auto-correlated errors
Kriging
Kriging is a statistical method that uses a variogram to calculate the spatial autocorrelation between points at graduated distances
The calculation of spatial autocorrelation is used to determine the weight of values at differing distances.
How much are observations related at varying distances?
Variogram
A function (model) describing the degree of spatial dependence of a spatial random field
Many functions exist for defining the relationship Attempts to answer:
How much will any two samples vary at a given distance.
Variogram II
A visual depiction of statistical dependence between pairs of points
x-axis: distance
y-axis: semi-variance
Variogram Visual
0.6
0.5
0.4
0.3
0.2
0.1
0.0
vgm(0.6,"Sph",1)
0.0 0.5
1.0 1.5 2.0 2.5 3.0
distance
semivariance
Variogram III
Empirical Variogram
Based on observed data
Model Variogram
Theoretical/model to represent the underlying relationship.
Example Variogram Formula, Spherical Model
3h h3
γ(h) = (s − n) 2r − 2r3 1(0,r)(h) + 1[r,∞)(h) + n1(0,∞)(h)
where,
s, (Sill) value at which the model flattens out
r, (Range) distance at which model first flattens out
n, (Nugget), value at which the semi-variogram (almost) intercepts the y-value.
Expectation, distance 0 values are equal
Due to sampling error or randomness they may not
h, Distance
Sill, range, nugget
Sill, range, nugget
Most variogram models are defined by their sill, range and nugget.
Range
Distance where the variogram reaches the sill
Distances greater than the range, do not exhibit
spatial-autocorrelation
Sill
Represents the variance of the random field.
Variance
Variance: Expectation of the squared deviation of a random variable from its mean
Var(X) = E[(X − μ)2] X is a random variable
E is the expectation (long run mean) μ is the mean of X or
1 n
Var(X)= n
(xi −μ)2
i=1
Work out the Varaince
1 n
Var(X)= n
X1 = 1,20,22,15,30 X2 = 10, 11, 13, 15, 17 X3 = 1, 100, 220, 300, 500
(xi −μ)2
i=1
Variogram Objective
Define covariance between points relative to distance Conditional variance on distance
The measure applied is semi-variance
Semivariance
Prior to selecting a variogram model, it is necessary to understand the semivariance from observed (empirical) data.
Empirical data is unlikely to have values at every distance Instead values are binned
Think Histogram
Variogram Example
lead.vgm <- variogram(logLead~1, data = meuse)
15000
10000
5000
500
1000 1500
Distance
Semivariance
From Emipircal to Model
Review the empirical variogram Select model with similar shape Fit model to empirical data
Objective
Trying to select a model that best represents the relationship between distance and semivariance.
Many Choices
vgm(1,"Nug",0)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
vgm(1,"Exp",1)
vgm(1,"Sph",1)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
vgm(1,"Gau",1)
vgm(1,"Exc",1)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
vgm(1,"Mat",1)
3
2
1
0
vgm(1,"Ste",1)
vgm(1,"Cir",1)
vgm(1,"Lin",0)
vgm(1,"Bes",1)
vgm(1,"Pen",1)
vgm(1,"Per",1)
3
2
1
0
3
2
1
0
vgm(1,"Wav",1)
vgm(1,"Hol",1)
vgm(1,"Log",1)
vgm(1,"Pow",1)
vgm(1,"Spl",1)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
distance
semivariance
Fit model to data Code
lead.fit <- fit.variogram(lead.vgm, model = vgm("Sph") plot(lead.vgm, lead.fit)
)
Fit model to data
15000
10000
5000
500
1000 1500
Distance
Semivariance
Variogram Fitting Challenges
Approaches can use a variogram cloud Very large number of data points
n2 2
Nonlinear models
Require estimated starting values
Parameters
lead.fit
## model psill range
## 1 Nug 3024.016 0.0000
## 2 Sph 12412.721 906.6158
How are the parameters estimated?
Initial estimate
Nugget Range Sill
In R, Partial Sill: Sill - Nugget Optimization
Inital Guess
nugget = 1000 sill = 13000
range = 1000
15000
10000
5000
500
1000 1500
distance
semivariance
Initial Fit
15000
10000
5000
500
1000 1500
distance
semivariance
Optimized
Optimized
15000
10000
5000
500
1000 1500
distance
semivariance
Optimized Parameters Initial
## model psill range ##1Nug1000 0 ## 2 Sph 13000 900
Optimized
## model psill range
## 1 Nug 3024.016 0.0000
## 2 Sph 12412.721 906.6158
Minimzing Sum of Squared Errors Initial
attr(lead.fit, "SSErr") ## [1] 26392.31
Optimized
attr(lead.fit.intial, "SSErr") ## NULL
Weights Review
IDW
Weighting is determined by the exponent and distance
Linear change Kriging
Weights are determined by the variogram model and distance Non-linear relationships possible
Kriging Details
Two step process
1. Determine the spatial covariance structure in the data.
2. Interpolate unobserved locations using weights from the variogram model.
Kriging Assumptions
1. Data are normally distributed.
2. Stationarity, spatial processes do not vary across the field. 3. Isotrophy
Process is uniform in all directions Violation: anisotrophy
Testing Assumptions
1. Data are normally distributed.
Histogram, apply necessary transformation
2. Stationarity
Spatial processes do not vary across the field.
Mean
Variance
3. Isotrophy
Uniform in all directions
Stationarity Evaluation
No formal tests Visual Approach
Divide the area up into polygons
Calculate the mean and variance by polygon
Stationarity Evaluation Process
1. Extent of the observations (xmin, xmax, ymin, ymax) 2. Create a grid across the study area
3. Calculate the mean and variance within raster cells.
Stationarity Example Get the extent
# Find extent values
# Using functions in the raster package library(raster)
area <- extent(meuse@bbox)
area
## class
## xmin
## xmax
## ymin
## ymax
: Extent
: 178605
: 181390
: 329714
: 333611
Create a grid
# Create a raster of 5 x 5 covering the area
raster_area <- raster(nrows = 5, ncols = 5, ext = area vals = -999)
# Convert Raster to Polygons
polygons_area <- rasterToPolygons(raster_area)
,
Count, Mean & Variance
# Removing all columns in meuse except logLead
meuse2 <- meuse[,-(1:12)]
# Number of obserations per polygon
assessment <- over(polygons_area, meuse2)
# Mean of values
assessment$mean <- over(polygons_area, meuse2,
fn = mean, na.rm = T)[,1]
# Variance of Values
assessment$var <- over(polygons_area, meuse2, fn = var, na.rm = T)[,1]
Clean up
# Change Names
names(assessment)[1] <- c("counts") # Replace Polygon Data
polygons_area@data <- assessment summary(polygons_area@data)
## counts
## Min. : 39
## 1st Qu.:112
## Median :210
## Mean :203 Mean :170.1 Mean : 6383.9
## 3rd Qu.:276
## Max. :464
## NA's :8
3rd Qu.:187.0
Max. :464.0
NA's :8
3rd Qu.: 8243.0
Max. :34189.5
NA's :9
mean
Min. : 43.5
var
Min. : 40.5
1st Qu.: 85.0
Median :145.8
1st Qu.: 1972.5
Median : 3611.7
Plots
Counts
Mean
450 400 350 300 250 200 150 100 50
450 400 350 300 250 200 150 100 50
Variance
35000 30000 25000 20000 15000 10000 5000 0
Isotrophy
Spatial dependence is stronger in the NW-SE direction than the SW-NE direction
library(SpTest) Vignette Example
Directional Variograms
lead.vgm.dir <- variogram(logLead~1, data = meuse, alpha=c(0,45,90,135))
plot(lead.vgm.dir, threshold = 30)
500 1000 1500 500 1000 1500
0 45 90 135
25000
20000
15000
10000
5000
500
1000 1500
500 1000 1500
distance
semivariance
Kriging Methods
Simple Kriging
Ordinary Kriging Universal Kriging Co-kriging
Poisson Kriging
Simple Kriging
All kriging assumptions
Assumedμ=0
Little value to empirical science
Ordinary Kriging
Assumes stationarity of the field Constant Mean and Variance
Trends across the study area cannot occur. Any value of μ for the field.
Universal Kriging
Relaxes the stationarity assumption for mean values Allows the mean to vary in a deterministic way
Trend across the study area Second-order stationarity
Variance must be stationary
Co-kriging
Includes an independent variable that is correlated with variable of interest.
For example, rainfall predictions
Include elevation (mountainous)
Poisson Kriging
Incidence counts or disease rates
Kriging Limitations
Misspecification of the variogram model can lead to erroneous results.
Challenging to evaluate assumptions
Difficult to meet certain assumptions
Accuracy is challenging with few points