[view raw Rmd]

Summary

This is the fourth blog on the stars project, an it completes the R-Consortium funded project for spatiotemporal tidy arrays with R. It reports on the current status of the project, and current development directions. Although this project ends, with the release of stars 0.3 on CRAN, the adoption, update, enthusiasm and participation in the development of the stars project have really only started, and will without doubt increase and continue.

Status

The stars package has now five vignettes (called “Articles” on the pkgdown site) that explain its main features. Besides writing these vignettes, a lot of work over the past few months went into

  • writing support for stars_proxy objects, objects for which the metadata has been read but for which the payload is still on disk. This allows handling raster files or data cubes that do not fit into memory. Manipulating them uses lazy evaluation: only when pixel values are really needed they are read and processed: this is for instance when a plot is needed, or results are to be written with write_stars. In case of plotting, no more pixels are processed than can be seen on the device.
  • making rectilinear and curvilinear grids work, by better parsing NetCDF files directly (rather than through GDAL), reading their bounds, and by writing conversions to sf objects so that they can be plotted;
  • writing a tighter integration with GDAL, e.g. for warping grids, contouring grids, and rasterizing polygons;
  • supporting 360-day and 365-day (noleap) calendars, which are used often in climate model data;
  • providing an off-cran starsdata package, with around 1 Gb of real imagery, too large for submitting to CRAN or GitHub, but used for testing and demonstration;
  • resolving issues (we’re at 154 now) and managing pull requests;
  • adding stars support to gstat, a package for spatial and spatiotemporal geostatistical modelling, interpolation and simulation.

I have used stars and sf successfully last week in a two-day course at Munich Re on Spatial Data Science with R (online material), focusing on data handling and geostatistics. Both packages worked out beautifully (with a minor amount of rough edges), in particular in conjunction with each other and with the tidyverse.

Further resources on the status of the project are found in

  • the video of my rstudio::conf presentation on “Spatial data science in the Tidyverse”
  • chapter 4 of the Spatial Data Science book (under development)

Future

Near future development will entail experiments with very large datasets, such as the entire Sentinel-2 archive. We secured earlier some funding from the R Consortium for doing this, and first outcomes will be presented shortly in a follow-up blog. A large challenge here is the handling of multi-resolution imagery, imagery spread over different coordinate reference systems (e.g., crossing multiple UTM zones) and the temporal resampling needed to form space-time raster cubes. This is being handled gracefully by the gdalcubes C++ library and R package developed by Marius Appel. The gdalcubes package has been submitted to CRAN.

Earlier stars blogs