Wrapping up the stars project
Summary
This is the fourth blog on the stars project, an it completes the R-Consortium funded project for spatiotemporal tidy arrays with R. It reports on the current status of the project, and current development directions. Although this project ends, with the release of stars 0.3 on CRAN, the adoption, update, enthusiasm and participation in the development of the stars project have really only started, and will without doubt increase and continue.
Status
The stars package has now five vignettes (called “Articles” on the pkgdown site) that explain its main features. Besides writing these vignettes, a lot of work over the past few months went into
- writing support for
stars_proxy
objects, objects for which the metadata has been read but for which the payload is still on disk. This allows handling raster files or data cubes that do not fit into memory. Manipulating them uses lazy evaluation: only when pixel values are really needed they are read and processed: this is for instance when aplot
is needed, or results are to be written withwrite_stars
. In case of plotting, no more pixels are processed than can be seen on the device. - making rectilinear and curvilinear grids work, by better parsing
NetCDF files directly (rather than through GDAL), reading their
bounds, and by writing conversions to
sf
objects so that they can be plotted; - writing a tighter integration with GDAL, e.g. for warping grids, contouring grids, and rasterizing polygons;
- supporting 360-day and 365-day (noleap) calendars, which are used often in climate model data;
- providing an off-cran
starsdata
package, with around 1 Gb of real imagery, too large for submitting to CRAN or GitHub, but used for testing and demonstration; - resolving issues (we’re at 154 now) and managing pull requests;
- adding
stars
support togstat
, a package for spatial and spatiotemporal geostatistical modelling, interpolation and simulation.
I have used stars
and sf
successfully last week in a two-day course
at Munich Re on Spatial Data Science with
R (online material), focusing on
data handling and geostatistics. Both packages worked out beautifully
(with a minor amount of rough edges), in particular in conjunction with
each other and with the tidyverse
.
Further resources on the status of the project are found in
- the video of my rstudio::conf presentation on “Spatial data science in the Tidyverse”
- chapter 4 of the Spatial Data Science book (under development)
Future
Near future development will entail experiments with very large datasets, such as the entire Sentinel-2 archive. We secured earlier some funding from the R Consortium for doing this, and first outcomes will be presented shortly in a follow-up blog. A large challenge here is the handling of multi-resolution imagery, imagery spread over different coordinate reference systems (e.g., crossing multiple UTM zones) and the temporal resampling needed to form space-time raster cubes. This is being handled gracefully by the gdalcubes C++ library and R package developed by Marius Appel. The gdalcubes package has been submitted to CRAN.