Fitting variogram models in gstat

view raw Rmd

Fitting variogram functions with R package gstat has become more flexible, and hopefully more user friendly. Up to now, after loading data

library(sp)
demo(meuse, ask = FALSE, echo = FALSE) # load meuse data set

users were required to use a sequence like

library(gstat)
v = variogram(log(zinc)~1, meuse)
v.fit = fit.variogram(v, vgm(1, "Sph", 900, 1))
v.fit

##   model      psill    range
## 1   Nug 0.05066243   0.0000
## 2   Sph 0.59060780 897.0209

where fit.variogram fits variogram parameters of a spherical model (Sph) to the sample variogram v. The values 1, 900 and 1 were needed as initial values in the weighted non-linear fit (where only the range parameter is non-linear).

This has changed in gstat version 1.2: now, vgm can take only a variogram model, as in

fit.variogram(v, vgm("Sph"))

##   model      psill    range
## 1   Nug 0.05065971   0.0000
## 2   Sph 0.59060511 897.0011

or even a set of models, in which case the best fitting is returned, as in

fit.variogram(v, vgm(c("Exp", "Sph")))

##   model      psill    range
## 1   Nug 0.05065971   0.0000
## 2   Sph 0.59060511 897.0011

fit.variogram(v, vgm(c("Exp", "Mat", "Sph")))

##   model      psill    range
## 1   Nug 0.05065971   0.0000
## 2   Sph 0.59060511 897.0011

where we still see that the sperical model is chosen. If we choose a different sample variogram, where Matern is chosen, as in:

v0 = variogram(zinc~1, meuse)
fit.variogram(v0, vgm(c("Exp", "Mat", "Sph")))

##   model    psill    range kappa
## 1   Nug   9486.4   0.0000   0.0
## 2   Mat 163285.3 381.7076   0.5

we see that the kappa value is 0.5, which is a default value that was not fit. We can fit kappa by specifying fit.kappa = TRUE, as in

options(warn = -1) # don't print warnings
fit.variogram(v0, vgm(c("Exp", "Mat", "Sph")), fit.kappa = TRUE)

##   model    psill    range kappa
## 1   Nug      0.0   0.0000   0.0
## 2   Mat 176455.5 456.4375   0.4

where the best fitting kappa from the range 0.3, 0.4, 0.5,…,5 is chosen. I suppressed warnings here, as around 20 warnings were printed in cases with crazy initial values. This is usual for Matern models: larger kappa values have effective ranges (the distance value at which the model reaches, say, 95% of its sill) much larger than the range parameter, as illustrated by

plot(variogramLine(vgm(1, "Mat", 1, kappa = 4), 10), type = 'l')

where at distance 1, 0.05 of the sill is reached (and the model, up till there, is nearly linear or parabolic, leading to a singularity during fit). A different parameterisation of the Matern model, given in Michael Stein’s book, is the following

plot(variogramLine(vgm(1, "Ste", 1, kappa = 4), 10), type = 'l')

This one has the same smoothness, but reaches the sill much closer to the range value. As a consequence it fits easier, that is, without warnings:

options(warn = 0) # set back to normal
fit.variogram(v0, vgm(c("Exp", "Ste", "Sph")), fit.kappa = TRUE)

##   model    psill    range kappa
## 1   Nug      0.0   0.0000   0.0
## 2   Ste 176455.4 577.3522   0.4

For those you need a more precise estimate of the optimal kappa value, you can iterate over steps of e.g. 0.01 by

fit.variogram(v0, vgm(c("Exp", "Ste", "Sph")), fit.kappa = seq(.3,5,.01))

##   model       psill    range kappa
## 1   Nug    282.7529   0.0000  0.00
## 2   Ste 175030.7434 563.3126  0.41

How it works

Default initial parameter values are chosen from the sample variogram, where:

the range parameter is taken as 1/3 of the maximum sample variogram distance,
the nugget parameter is taken as the mean of the first three sample variogram values, and
the partial sill is taken as the mean of the last five sample variogram values.

vgm("Sph")

##   model psill range
## 1   Nug    NA     0
## 2   Sph    NA    NA

contains NA values for the numeric parameters, and under the hood (undocumented)

gstat:::vgm_fill_na(vgm("Sph"), v)

##   model     psill    range
## 1   Nug 0.2141508   0.0000
## 2   Sph 0.6200691 514.4008

fills the NA values with the initial values.

Providing more than one model to vgm returns a list,

vgm(c("Sph", "Exp"))

## [[1]]
##   model psill range
## 1   Nug    NA     0
## 2   Sph    NA    NA
## 
## [[2]]
##   model psill range
## 1   Nug    NA     0
## 2   Exp    NA    NA
## 
## attr(,"class")
## [1] "variogramModelList" "list"

which fit.variogram iterates over, returning the best fitting model.

Comparison to automap

Function automap::autofitVariogram does a similar job, but includes the computation of the sample variogram from data (which can be controlled by passing parameters to ...). It takes slightly different defaults for fitting, definitely different defaults when computing the sample variogram, and has options for combining distance bins.