Last week’s geostat summer school in Albacete was a lot of fun, with about 60 participants and 10 lecturers. Various courses were given on handling, analyzing and modelling spatial and spatiotemporal data, using open source software. Participants came from all kind of directions, not only geosciences but also antropology, epidemiology and surprisingly many from biology and ecology. Tom Hengl invited us to discuss the future of spatial and spatiotemporal analysis on day 2:

In the background of the screen, you see the first appveyor (= windows) build of sf, the simple features for R package. It means that thanks to Jeroen Ooms and rwinlib, windows users can now build binary packages that link to GDAL 2.1, GEOS and Proj.4:

Thanks to the efficient well-known-binary interface of sf, and thanks to using C++ and Rcpp, compared to sp the sf package now reads large feature sets much (18 x) faster into much (4 x) smaller objects (benchmark shapefile provided by Robin Lovelace):

> system.time(r <- rgdal::readOGR(".", "gis.osm_buildings_v06"))
OGR data source with driver: ESRI Shapefile 
Source: ".", layer: "gis.osm_buildings_v06"
with 487576 features
It has 6 fields
   user  system elapsed 
 90.312   0.744  91.053 
> object.size(r)
1556312104 bytes
> system.time(s <- sf::st_read(".", "gis.osm_buildings_v06"))
Reading layer gis.osm_buildings_v06 from data source . using driver "ESRI Shapefile"
features:       487576
fields:         6
converted into: MULTIPOLYGON
proj4string:    +proj=longlat +datum=WGS84 +no_defs 
   user  system elapsed 
  5.100   0.092   5.191 
> object.size(s)
410306448 bytes

Raster data

Currently, R package raster is gradually being ported to C++ for efficiency reasons. For reading and writing data through GDAL, it uses rgdal, so when going through a big (cached) raster in C++ it has to go through C++ \(\rightarrow\) R \(\rightarrow\) rgdal \(\rightarrow\) R \(\rightarrow\) C++ for every chunk of data. The current set of raster classes

library(raster)
Loading required package: sp
> showClass("Raster")
Virtual Class "Raster" [package "raster"]

Slots:
                                                                            
Name:      title    extent   rotated  rotation     ncols     nrows       crs
Class: character    Extent   logical .Rotation   integer   integer       CRS
                          
Name:    history         z
Class:      list      list

Extends: "BasicRaster"

Known Subclasses: 
Class "RasterLayer", directly
Class "RasterBrick", directly
Class "RasterStack", directly
Class ".RasterQuad", directly
Class "RasterLayerSparse", by class "RasterLayer", distance 2
Class ".RasterBrickSparse", by class "RasterBrick", distance 2

has grown somewhat ad hoc, and should be replaced by a single class that supports

  • one or more layers (bands, attributes)
  • time as a dimension
  • altitude or depth as a dimension (possibly expressed as pressure level)

The future

So, how does the future of R spatial look like?

  1. vector data use simple features, now in package sf
  2. raster data get a single, flexible class that generalizes all Raster* classes now in raster and integrates with simple features
  3. vector and raster data share a clear and consistent interface, no more conflicting function names
  4. raster computing directly links to GDAL, but supports distributed computing back ends provided e.g. by SciDB, Google Earth Engine or rasdaman
  5. spatiotemporal classes in spacetime and trajectories build on simple features or raster
  6. support for measurement units
  7. support for strong typing that encourages meaningful computation.

Exciting times are ahead of us. We need your help!