Simple features for R, part 2
- What happened so far?
- Install & test
- How does it work?
- Still to do/to be decided
What happened so far?
- in an earlier blog post I introduced the idea of having simple features mapped directly into simple R objects
- an R Consortium ISC proposal to implement this got granted
- during UseR! 2016 I presented this proposal (slides), which we followed up with an open discussion on future directions
- first steps to implement this in the sf package have finished, and are described below
This blog post describes current progress.
Install & test
You can install package sf directly from github:
library(devtools) # maybe install first?
install_github("edzer/sfr", ref = "16e205f54976bee75c72ac1b54f117868b6fafbc")
if you want to try out read.sf, which reads through GDAL 2.0, you also
need my fork of the R package
rgdal2, installed by
install_github("edzer/rgdal2")
this, obviously, requires that GDAL 2.0 or later is installed, along with development files.
After installing, a vignette contains some basic operations, and is shown by
library(sf)
vignette("basic")
How does it work?
Basic design ideas and constraints have been written in this document.
Simple features are one of the following 17 types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, CircularString, CompoundCurve, CurvePolygon, MultiCurve, MultiSurface, Curve, Surface, PolyhedralSurface, TIN, and Triangle. Each type can have 2D points (XY), 3D points (XYZ), 2D points with measure (XYM) and 3D points with measure (XYZM). This leads to 17 x 4 = 68 combinations.
The first seven of these are most common, and have been implemented, allowing for XY, XYZ, XYM and XYZM geometries.
Simple feature instances: sfi
A single simple feature is created by calling the constructor function, along with a modifier in case a three-dimensional geometry has measure “M” as its third dimension:
library(sf)
POINT(c(2,3))
## [1] "POINT(2 3)"
POINT(c(2,3,4))
## [1] "POINT Z(2 3 4)"
POINT(c(2,3,4), "M")
## [1] "POINT M(2 3 4)"
POINT(c(2,3,4,5))
## [1] "POINT ZM(2 3 4 5)"
what is printed is a well kown text representation of the object; the data itself is however stored as a regular R vector or matrix:
str(POINT(c(2,3,4), "M"))
## Classes 'POINT M', 'sfi'  num [1:3] 2 3 4
str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2))))
##  LINESTRING [1:3, 1:2] 2 3 3 2 3 2
##  - attr(*, "class")= chr [1:2] "LINESTRING" "sfi"
By using the two simple rules that
- sets of points are kept in a matrix
- other sets are kept in a list
we end up with the following structures, with increasing complexity:
Sets of points (matrix):
str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2))))
##  LINESTRING [1:3, 1:2] 2 3 3 2 3 2
##  - attr(*, "class")= chr [1:2] "LINESTRING" "sfi"
str(MULTIPOINT(rbind(c(2,2), c(3,3), c(3,2))))
##  MULTIPOINT [1:3, 1:2] 2 3 3 2 3 2
##  - attr(*, "class")= chr [1:2] "MULTIPOINT" "sfi"
Sets of sets of points:
str(MULTILINESTRING(list(rbind(c(2,2), c(3,3), c(3,2)), rbind(c(2,1),c(0,0)))))
## List of 2
##  $ : num [1:3, 1:2] 2 3 3 2 3 2
##  $ : num [1:2, 1:2] 2 0 1 0
##  - attr(*, "class")= chr [1:2] "MULTILINESTRING" "sfi"
outer = matrix(c(0,0,10,0,10,10,0,10,0,0),ncol=2, byrow=TRUE)
hole1 = matrix(c(1,1,1,2,2,2,2,1,1,1),ncol=2, byrow=TRUE)
hole2 = matrix(c(5,5,5,6,6,6,6,5,5,5),ncol=2, byrow=TRUE)
str(POLYGON(list(outer, hole1, hole2)))
## List of 3
##  $ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0
##  $ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1
##  $ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5
##  - attr(*, "class")= chr [1:2] "POLYGON" "sfi"
Sets of sets of sets of points:
pol1 = list(outer, hole1, hole2)
pol2 = list(outer + 12, hole1 + 12)
pol3 = list(outer + 24)
mp = MULTIPOLYGON(list(pol1,pol2,pol3))
str(mp)
## List of 3
##  $ :List of 3
##   ..$ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0
##   ..$ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1
##   ..$ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5
##  $ :List of 2
##   ..$ : num [1:5, 1:2] 12 22 22 12 12 12 12 22 22 12
##   ..$ : num [1:5, 1:2] 13 13 14 14 13 13 14 14 13 13
##  $ :List of 1
##   ..$ : num [1:5, 1:2] 24 34 34 24 24 24 24 34 34 24
##  - attr(*, "class")= chr [1:2] "MULTIPOLYGON" "sfi"
Sets of sets of sets of sets of points:
str(GEOMETRYCOLLECTION(list(MULTIPOLYGON(list(pol1,pol2,pol3)), POINT(c(2,3)))))
## List of 2
##  $ :List of 3
##   ..$ :List of 3
##   .. ..$ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0
##   .. ..$ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1
##   .. ..$ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5
##   ..$ :List of 2
##   .. ..$ : num [1:5, 1:2] 12 22 22 12 12 12 12 22 22 12
##   .. ..$ : num [1:5, 1:2] 13 13 14 14 13 13 14 14 13 13
##   ..$ :List of 1
##   .. ..$ : num [1:5, 1:2] 24 34 34 24 24 24 24 34 34 24
##   ..- attr(*, "class")= chr [1:2] "MULTIPOLYGON" "sfi"
##  $ :Classes 'POINT', 'sfi'  num [1:2] 2 3
##  - attr(*, "class")= chr [1:2] "GEOMETRYCOLLECTION" "sfi"
where this is of course a worst case: GEOMETRYCOLLECTION objects with
simpler elements have less nesting.
Methods for sfi
The following methods have been implemented for sfi objects:
methods(class = "sfi")
## [1] as.WKT format print 
## see '?methods' for accessing help and source code
Alternatives to this implementation
- Package rgdal2 reads point sets
not in a matrix, but into a list with numeric vectors named xandy. This is closer to the GDAL (OGR) data model, and would allow for easier disambiguation of the third dimension (morz) in case of three-dimensional points. It is more difficult to select a single point, and requires validation of vector lenghts being identical. I’m inclined to keep usingmatrixfor point sets.
- Currently, POINT Zis of classc("POINT Z", "sfi"). An alternative would be to have it derive fromPOINT, i.e. give it classc("POINT Z", "POINT", "sfi"). This would make it easier to write methods for XYZ, XYM and XYZM geometries. This may be worth trying out.
Simple feature list columns: sfc
Collections of simple features can be added together into a list. If all elements of this list
- are of identical type (have identical class), or are a mix of XandMULTIX(withXbeing one ofPOINT,LINESTRINGorPOLYGON)
- have an identical coordinate reference system
then they can be combined in a sfc object. This object
- converts, if needed, XintoMULTIX(this is also what PostGIS does),
- registers the coordinate reference system in attributes epsgandproj4string,
- has the bounding box in attribute bbox, and updates it after subsetting
ls1 = LINESTRING(rbind(c(2,2), c(3,3), c(3,2)))
ls2 = LINESTRING(rbind(c(5,5), c(4,1), c(1,2)))
sfc = sfc(list(ls1, ls2), epsg = 4326)
attributes(sfc)
## $class
## [1] "sfc"
## 
## $type
## [1] "LINESTRING"
## 
## $epsg
## [1] 4326
## 
## $bbox
## xmin xmax ymin ymax 
##    1    5    1    5 
## 
## $proj4string
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
attributes(sfc[1])
## $class
## [1] "sfc"
## 
## $type
## [1] "LINESTRING"
## 
## $epsg
## [1] 4326
## 
## $bbox
## xmin xmax ymin ymax 
##    2    3    2    3 
## 
## $proj4string
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
The following methods have been implemented for sfc simple feature
list columns:
methods(class = "sfc")
## [1] bbox    format  [       summary
## see '?methods' for accessing help and source code
data.frames with simple features: sf
Typical spatial data contain attribute values and attribute geometries.
When combined in a table, they can be converted into sf objects, e.g.
by
roads = data.frame(widths = c(5, 4.5))
roads$geom = sfc
roads.sf = sf(roads)
roads.sf
##   widths                      geom
## 1    5.0 LINESTRING(2 2, 3 3, 3 2)
## 2    4.5 LINESTRING(5 5, 4 1, 1 2)
summary(roads.sf)
##      widths                 geom  
##  Min.   :4.500   LINESTRING   :2  
##  1st Qu.:4.625   epsg:4326    :0  
##  Median :4.750   +init=epsg...:0  
##  Mean   :4.750                    
##  3rd Qu.:4.875                    
##  Max.   :5.000
attributes(roads.sf)
## $names
## [1] "widths" "geom"  
## 
## $row.names
## [1] 1 2
## 
## $class
## [1] "sf"         "data.frame"
## 
## $sf_column
## geom 
##    2 
## 
## $relation_to_geometry
## widths 
##   <NA> 
## Levels: field lattice entity
here, attribute relation_to_geometry allows documenting how attributes
relate to the geometry: are they constant (field), aggregated over the
geometry (lattice), or do they identify individual entities (buildings,
parcels etc.)?
The following methods have been implemented for sfc simple feature
list columns:
methods(class = "sf")
## [1] geometry
## see '?methods' for accessing help and source code
Coercion to and from sp
Points, MultiPoints, Lines, MultiLines, Polygons and MultiPolygons can
be converted between sf and
sp, both ways.
A round trip is demonstrated by:
df = data.frame(a=1)
df$geom = sfc(list(mp))
sf = sf(df)
library(methods)
a = as(sf, "Spatial")
class(a)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
b = as.sf(a)
all.equal(sf, b) # round-trip sf-sp-sf
## [1] TRUE
a2 = as(a, "SpatialPolygonsDataFrame")
all.equal(a, a2) # round-trip sp-sf-sp
## [1] TRUE
Reading through GDAL
Function read.sf works, if rgdal2 is installed (see above), and
reads simple features through GDAL:
(s = read.sf(system.file("shapes/", package="maptools"), "sids"))[1:5,]
##    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
## 0 0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
## 1 0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
## 2 0.143      1.63  1828    1828       Surry 37171  37171       86  3188
## 3  0.07     2.968  1831    1831   Currituck 37053  37053       27   508
## 4 0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
##   SID74 NWBIR74 BIR79 SID79 NWBIR79                               geom
## 0     1      10  1364     0      19 MULTIPOLYGON(((-81.47275543212 ...
## 1     0      10   542     3      12 MULTIPOLYGON(((-81.23989105224 ...
## 2     5     208  3616     6     260 MULTIPOLYGON(((-80.45634460449 ...
## 3     1     123   830     2     145 MULTIPOLYGON(((-76.00897216796 ...
## 4     9    1066  1606     3    1197 MULTIPOLYGON(((-77.21766662597 ...
summary(s)
##       AREA      PERIMETER      CNTY_       CNTY_ID          NAME   
##  0.118  : 4   1.307  : 2   1825   : 1   1825   : 1   Alamance : 1  
##  0.091  : 3   1.601  : 2   1827   : 1   1827   : 1   Alexander: 1  
##  0.143  : 3   1.68   : 2   1828   : 1   1828   : 1   Alleghany: 1  
##  0.07   : 2   1.791  : 2   1831   : 1   1831   : 1   Anson    : 1  
##  0.078  : 2   0.999  : 1   1832   : 1   1832   : 1   Ashe     : 1  
##  0.08   : 2   1      : 1   1833   : 1   1833   : 1   Avery    : 1  
##  (Other):84   (Other):90   (Other):94   (Other):94   (Other)  :94  
##       FIPS        FIPSNO      CRESS_ID      BIR74        SID74   
##  37001  : 1   37001  : 1   1      : 1   1027   : 1   0      :13  
##  37003  : 1   37003  : 1   10     : 1   1035   : 1   4      :13  
##  37005  : 1   37005  : 1   100    : 1   1091   : 1   1      :11  
##  37007  : 1   37007  : 1   11     : 1   11158  : 1   5      :11  
##  37009  : 1   37009  : 1   12     : 1   1143   : 1   2      : 8  
##  37011  : 1   37011  : 1   13     : 1   1173   : 1   3      : 6  
##  (Other):94   (Other):94   (Other):94   (Other):94   (Other):38  
##     NWBIR74       BIR79        SID79       NWBIR79             geom    
##  736    : 3   10432  : 1   2      :10   1161   : 2   MULTIPOLYGON:100  
##  1      : 2   1059   : 1   0      : 9   5      : 2   epsg:NA     :  0  
##  10     : 2   1141   : 1   1      : 9   10     : 1                     
##  1243   : 2   11455  : 1   4      : 9   1023   : 1                     
##  134    : 2   1157   : 1   5      : 9   1033   : 1                     
##  930    : 2   1173   : 1   3      : 6   104    : 1                     
##  (Other):87   (Other):94   (Other):48   (Other):92
This also shows the abbreviation of long geometries when printed or
summarized, provided by the format methods.
The following works for me, with PostGIS installed and data loaded:
(s = read.sf("PG:dbname=postgis", "meuse2"))[1:5,]
##   zinc                 geom
## 1 1022 POINT(181072 333611)
## 2 1141 POINT(181025 333558)
## 3  640 POINT(181165 333537)
## 4  257 POINT(181298 333484)
## 5  269 POINT(181307 333330)
summary(s)
##       zinc                   geom    
##  Min.   : 113.0   POINT        :155  
##  1st Qu.: 198.0   epsg:NA      :  0  
##  Median : 326.0   +proj=ster...:  0  
##  Mean   : 469.7                      
##  3rd Qu.: 674.5                      
##  Max.   :1839.0
Still to do/to be decided
The following issues need to be decided upon:
- reproject sf objects through rgdal2? support well-known-text for CRS? or use PROJ.4 directly?
- when subsetting attributes from an sfobjects, make geometry sticky (like sp does), or drop geometry and returndata.frame(data.frame behaviour)?
The following things still need to be done:
- write simple features through GDAL (using rgdal2)
- using gdal geometry functions in rgdal2
- extend rgdal2 to also read XYZ,XYM, andXYZMgeometries - my feeling is that this will be easier than modifyingrgdal
- reprojection of sf objects
- link to GEOS, using GEOS functions: GDAL with GEOS enabled (and
rgdal2) has some of this, but not for instancergeos::gRelate
- develop better and more complete test cases; also check the OGC test suite
- improve documentation, add tutorial (vignettes, paper)
- add plot functions (base, grid)
- explore direct WKB - sf conversion, without GDAL
- explore how meaningfulness of operations can be verified when for
attributes their relation_to_geometryhas been specified
Please let me know if you have any comments, suggestions or questions!
