The future of R spatial
Last week’s geostat summer school in Albacete was a lot of fun, with about 60 participants and 10 lecturers. Various courses were given on handling, analyzing and modelling spatial and spatiotemporal data, using open source software. Participants came from all kind of directions, not only geosciences but also antropology, epidemiology and surprisingly many from biology and ecology. Tom Hengl invited us to discuss the future of spatial and spatiotemporal analysis on day 2:
@edzerpebesma talking about the future of spatial and spatiotemporal analysis at #geostat2016 @uclm_inter pic.twitter.com/yfQL2vb5ii
— Rubén G. Mateo (@RubenGMateo) September 20, 2016
In the background of the screen, you see the first appveyor (= windows) build of sf, the simple features for R package. It means that thanks to Jeroen Ooms and rwinlib, windows users can now build binary packages that link to GDAL 2.1, GEOS and Proj.4:
Windows users with Rtools installed can now build and install sfr. Opens the way for others to directly Rcpp into gdal2. Ta2 @opencpu !
— Edzer Pebesma (@edzerpebesma) September 21, 2016
Thanks to the efficient well-known-binary interface of sf, and thanks to using C++ and Rcpp, compared to sp the sf package now reads large feature sets much (18 x) faster into much (4 x) smaller objects (benchmark shapefile provided by Robin Lovelace):
> system.time(r <- rgdal::readOGR(".", "gis.osm_buildings_v06"))
OGR data source with driver: ESRI Shapefile
Source: ".", layer: "gis.osm_buildings_v06"
with 487576 features
It has 6 fields
user system elapsed
90.312 0.744 91.053
> object.size(r)
1556312104 bytes
> system.time(s <- sf::st_read(".", "gis.osm_buildings_v06"))
Reading layer gis.osm_buildings_v06 from data source . using driver "ESRI Shapefile"
features: 487576
fields: 6
converted into: MULTIPOLYGON
proj4string: +proj=longlat +datum=WGS84 +no_defs
user system elapsed
5.100 0.092 5.191
> object.size(s)
410306448 bytes
Raster data
Currently, R package raster is gradually being ported to C++ for efficiency reasons. For reading and writing data through GDAL, it uses rgdal, so when going through a big (cached) raster in C++ it has to go through C++ \(\rightarrow\) R \(\rightarrow\) rgdal \(\rightarrow\) R \(\rightarrow\) C++ for every chunk of data. The current set of raster classes
library(raster)
Loading required package: sp
> showClass("Raster")
Virtual Class "Raster" [package "raster"]
Slots:
Name: title extent rotated rotation ncols nrows crs
Class: character Extent logical .Rotation integer integer CRS
Name: history z
Class: list list
Extends: "BasicRaster"
Known Subclasses:
Class "RasterLayer", directly
Class "RasterBrick", directly
Class "RasterStack", directly
Class ".RasterQuad", directly
Class "RasterLayerSparse", by class "RasterLayer", distance 2
Class ".RasterBrickSparse", by class "RasterBrick", distance 2
has grown somewhat ad hoc, and should be replaced by a single class that supports
- one or more layers (bands, attributes)
- time as a dimension
- altitude or depth as a dimension (possibly expressed as pressure level)
The future
So, how does the future of R spatial look like?
- vector data use simple features, now in package sf
- raster data get a single, flexible class that generalizes all
Raster*
classes now inraster
and integrates with simple features - vector and raster data share a clear and consistent interface, no more conflicting function names
- raster computing directly links to GDAL, but supports distributed computing back ends provided e.g. by SciDB, Google Earth Engine or rasdaman
- spatiotemporal classes in spacetime and trajectories build on simple features or raster
- support for measurement units
- support for strong typing that encourages meaningful computation.
Exciting times are ahead of us. We need your help!