Simple features for R, part 2
- What happened so far?
- Install & test
- How does it work?
- Still to do/to be decided
What happened so far?
- in an earlier blog post I introduced the idea of having simple features mapped directly into simple R objects
- an R Consortium ISC proposal to implement this got granted
- during UseR! 2016 I presented this proposal (slides), which we followed up with an open discussion on future directions
- first steps to implement this in the sf package have finished, and are described below
This blog post describes current progress.
Install & test
You can install package sf
directly from github:
library(devtools) # maybe install first?
install_github("edzer/sfr", ref = "16e205f54976bee75c72ac1b54f117868b6fafbc")
if you want to try out read.sf
, which reads through GDAL 2.0, you also
need my fork of the R package
rgdal2, installed by
install_github("edzer/rgdal2")
this, obviously, requires that GDAL 2.0 or later is installed, along with development files.
After installing, a vignette contains some basic operations, and is shown by
library(sf)
vignette("basic")
How does it work?
Basic design ideas and constraints have been written in this document.
Simple features are one of the following 17 types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, CircularString, CompoundCurve, CurvePolygon, MultiCurve, MultiSurface, Curve, Surface, PolyhedralSurface, TIN, and Triangle. Each type can have 2D points (XY), 3D points (XYZ), 2D points with measure (XYM) and 3D points with measure (XYZM). This leads to 17 x 4 = 68 combinations.
The first seven of these are most common, and have been implemented, allowing for XY, XYZ, XYM and XYZM geometries.
Simple feature instances: sfi
A single simple feature is created by calling the constructor function, along with a modifier in case a three-dimensional geometry has measure “M” as its third dimension:
library(sf)
POINT(c(2,3))
## [1] "POINT(2 3)"
POINT(c(2,3,4))
## [1] "POINT Z(2 3 4)"
POINT(c(2,3,4), "M")
## [1] "POINT M(2 3 4)"
POINT(c(2,3,4,5))
## [1] "POINT ZM(2 3 4 5)"
what is printed is a well kown text representation of the object; the data itself is however stored as a regular R vector or matrix:
str(POINT(c(2,3,4), "M"))
## Classes 'POINT M', 'sfi' num [1:3] 2 3 4
str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2))))
## LINESTRING [1:3, 1:2] 2 3 3 2 3 2
## - attr(*, "class")= chr [1:2] "LINESTRING" "sfi"
By using the two simple rules that
- sets of points are kept in a
matrix
- other sets are kept in a
list
we end up with the following structures, with increasing complexity:
Sets of points (matrix):
str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2))))
## LINESTRING [1:3, 1:2] 2 3 3 2 3 2
## - attr(*, "class")= chr [1:2] "LINESTRING" "sfi"
str(MULTIPOINT(rbind(c(2,2), c(3,3), c(3,2))))
## MULTIPOINT [1:3, 1:2] 2 3 3 2 3 2
## - attr(*, "class")= chr [1:2] "MULTIPOINT" "sfi"
Sets of sets of points:
str(MULTILINESTRING(list(rbind(c(2,2), c(3,3), c(3,2)), rbind(c(2,1),c(0,0)))))
## List of 2
## $ : num [1:3, 1:2] 2 3 3 2 3 2
## $ : num [1:2, 1:2] 2 0 1 0
## - attr(*, "class")= chr [1:2] "MULTILINESTRING" "sfi"
outer = matrix(c(0,0,10,0,10,10,0,10,0,0),ncol=2, byrow=TRUE)
hole1 = matrix(c(1,1,1,2,2,2,2,1,1,1),ncol=2, byrow=TRUE)
hole2 = matrix(c(5,5,5,6,6,6,6,5,5,5),ncol=2, byrow=TRUE)
str(POLYGON(list(outer, hole1, hole2)))
## List of 3
## $ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0
## $ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1
## $ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5
## - attr(*, "class")= chr [1:2] "POLYGON" "sfi"
Sets of sets of sets of points:
pol1 = list(outer, hole1, hole2)
pol2 = list(outer + 12, hole1 + 12)
pol3 = list(outer + 24)
mp = MULTIPOLYGON(list(pol1,pol2,pol3))
str(mp)
## List of 3
## $ :List of 3
## ..$ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0
## ..$ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1
## ..$ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5
## $ :List of 2
## ..$ : num [1:5, 1:2] 12 22 22 12 12 12 12 22 22 12
## ..$ : num [1:5, 1:2] 13 13 14 14 13 13 14 14 13 13
## $ :List of 1
## ..$ : num [1:5, 1:2] 24 34 34 24 24 24 24 34 34 24
## - attr(*, "class")= chr [1:2] "MULTIPOLYGON" "sfi"
Sets of sets of sets of sets of points:
str(GEOMETRYCOLLECTION(list(MULTIPOLYGON(list(pol1,pol2,pol3)), POINT(c(2,3)))))
## List of 2
## $ :List of 3
## ..$ :List of 3
## .. ..$ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0
## .. ..$ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1
## .. ..$ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5
## ..$ :List of 2
## .. ..$ : num [1:5, 1:2] 12 22 22 12 12 12 12 22 22 12
## .. ..$ : num [1:5, 1:2] 13 13 14 14 13 13 14 14 13 13
## ..$ :List of 1
## .. ..$ : num [1:5, 1:2] 24 34 34 24 24 24 24 34 34 24
## ..- attr(*, "class")= chr [1:2] "MULTIPOLYGON" "sfi"
## $ :Classes 'POINT', 'sfi' num [1:2] 2 3
## - attr(*, "class")= chr [1:2] "GEOMETRYCOLLECTION" "sfi"
where this is of course a worst case: GEOMETRYCOLLECTION
objects with
simpler elements have less nesting.
Methods for sfi
The following methods have been implemented for sfi
objects:
methods(class = "sfi")
## [1] as.WKT format print
## see '?methods' for accessing help and source code
Alternatives to this implementation
- Package rgdal2 reads point sets
not in a matrix, but into a list with numeric vectors named
x
andy
. This is closer to the GDAL (OGR) data model, and would allow for easier disambiguation of the third dimension (m
orz
) in case of three-dimensional points. It is more difficult to select a single point, and requires validation of vector lenghts being identical. I’m inclined to keep usingmatrix
for point sets. - Currently,
POINT Z
is of classc("POINT Z", "sfi")
. An alternative would be to have it derive fromPOINT
, i.e. give it classc("POINT Z", "POINT", "sfi")
. This would make it easier to write methods for XYZ, XYM and XYZM geometries. This may be worth trying out.
Simple feature list columns: sfc
Collections of simple features can be added together into a list. If all elements of this list
- are of identical type (have identical class), or are a mix of
X
andMULTIX
(withX
being one ofPOINT
,LINESTRING
orPOLYGON
) - have an identical coordinate reference system
then they can be combined in a sfc
object. This object
- converts, if needed,
X
intoMULTIX
(this is also what PostGIS does), - registers the coordinate reference system in attributes
epsg
andproj4string
, - has the bounding box in attribute
bbox
, and updates it after subsetting
ls1 = LINESTRING(rbind(c(2,2), c(3,3), c(3,2)))
ls2 = LINESTRING(rbind(c(5,5), c(4,1), c(1,2)))
sfc = sfc(list(ls1, ls2), epsg = 4326)
attributes(sfc)
## $class
## [1] "sfc"
##
## $type
## [1] "LINESTRING"
##
## $epsg
## [1] 4326
##
## $bbox
## xmin xmax ymin ymax
## 1 5 1 5
##
## $proj4string
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
attributes(sfc[1])
## $class
## [1] "sfc"
##
## $type
## [1] "LINESTRING"
##
## $epsg
## [1] 4326
##
## $bbox
## xmin xmax ymin ymax
## 2 3 2 3
##
## $proj4string
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
The following methods have been implemented for sfc
simple feature
list columns:
methods(class = "sfc")
## [1] bbox format [ summary
## see '?methods' for accessing help and source code
data.frames with simple features: sf
Typical spatial data contain attribute values and attribute geometries.
When combined in a table, they can be converted into sf
objects, e.g.
by
roads = data.frame(widths = c(5, 4.5))
roads$geom = sfc
roads.sf = sf(roads)
roads.sf
## widths geom
## 1 5.0 LINESTRING(2 2, 3 3, 3 2)
## 2 4.5 LINESTRING(5 5, 4 1, 1 2)
summary(roads.sf)
## widths geom
## Min. :4.500 LINESTRING :2
## 1st Qu.:4.625 epsg:4326 :0
## Median :4.750 +init=epsg...:0
## Mean :4.750
## 3rd Qu.:4.875
## Max. :5.000
attributes(roads.sf)
## $names
## [1] "widths" "geom"
##
## $row.names
## [1] 1 2
##
## $class
## [1] "sf" "data.frame"
##
## $sf_column
## geom
## 2
##
## $relation_to_geometry
## widths
## <NA>
## Levels: field lattice entity
here, attribute relation_to_geometry
allows documenting how attributes
relate to the geometry: are they constant (field), aggregated over the
geometry (lattice), or do they identify individual entities (buildings,
parcels etc.)?
The following methods have been implemented for sfc
simple feature
list columns:
methods(class = "sf")
## [1] geometry
## see '?methods' for accessing help and source code
Coercion to and from sp
Points, MultiPoints, Lines, MultiLines, Polygons and MultiPolygons can
be converted between sf
and
sp, both ways.
A round trip is demonstrated by:
df = data.frame(a=1)
df$geom = sfc(list(mp))
sf = sf(df)
library(methods)
a = as(sf, "Spatial")
class(a)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
b = as.sf(a)
all.equal(sf, b) # round-trip sf-sp-sf
## [1] TRUE
a2 = as(a, "SpatialPolygonsDataFrame")
all.equal(a, a2) # round-trip sp-sf-sp
## [1] TRUE
Reading through GDAL
Function read.sf
works, if rgdal2
is installed (see above), and
reads simple features through GDAL:
(s = read.sf(system.file("shapes/", package="maptools"), "sids"))[1:5,]
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74
## 0 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091
## 1 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487
## 2 0.143 1.63 1828 1828 Surry 37171 37171 86 3188
## 3 0.07 2.968 1831 1831 Currituck 37053 37053 27 508
## 4 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421
## SID74 NWBIR74 BIR79 SID79 NWBIR79 geom
## 0 1 10 1364 0 19 MULTIPOLYGON(((-81.47275543212 ...
## 1 0 10 542 3 12 MULTIPOLYGON(((-81.23989105224 ...
## 2 5 208 3616 6 260 MULTIPOLYGON(((-80.45634460449 ...
## 3 1 123 830 2 145 MULTIPOLYGON(((-76.00897216796 ...
## 4 9 1066 1606 3 1197 MULTIPOLYGON(((-77.21766662597 ...
summary(s)
## AREA PERIMETER CNTY_ CNTY_ID NAME
## 0.118 : 4 1.307 : 2 1825 : 1 1825 : 1 Alamance : 1
## 0.091 : 3 1.601 : 2 1827 : 1 1827 : 1 Alexander: 1
## 0.143 : 3 1.68 : 2 1828 : 1 1828 : 1 Alleghany: 1
## 0.07 : 2 1.791 : 2 1831 : 1 1831 : 1 Anson : 1
## 0.078 : 2 0.999 : 1 1832 : 1 1832 : 1 Ashe : 1
## 0.08 : 2 1 : 1 1833 : 1 1833 : 1 Avery : 1
## (Other):84 (Other):90 (Other):94 (Other):94 (Other) :94
## FIPS FIPSNO CRESS_ID BIR74 SID74
## 37001 : 1 37001 : 1 1 : 1 1027 : 1 0 :13
## 37003 : 1 37003 : 1 10 : 1 1035 : 1 4 :13
## 37005 : 1 37005 : 1 100 : 1 1091 : 1 1 :11
## 37007 : 1 37007 : 1 11 : 1 11158 : 1 5 :11
## 37009 : 1 37009 : 1 12 : 1 1143 : 1 2 : 8
## 37011 : 1 37011 : 1 13 : 1 1173 : 1 3 : 6
## (Other):94 (Other):94 (Other):94 (Other):94 (Other):38
## NWBIR74 BIR79 SID79 NWBIR79 geom
## 736 : 3 10432 : 1 2 :10 1161 : 2 MULTIPOLYGON:100
## 1 : 2 1059 : 1 0 : 9 5 : 2 epsg:NA : 0
## 10 : 2 1141 : 1 1 : 9 10 : 1
## 1243 : 2 11455 : 1 4 : 9 1023 : 1
## 134 : 2 1157 : 1 5 : 9 1033 : 1
## 930 : 2 1173 : 1 3 : 6 104 : 1
## (Other):87 (Other):94 (Other):48 (Other):92
This also shows the abbreviation of long geometries when printed or
summarized, provided by the format
methods.
The following works for me, with PostGIS installed and data loaded:
(s = read.sf("PG:dbname=postgis", "meuse2"))[1:5,]
## zinc geom
## 1 1022 POINT(181072 333611)
## 2 1141 POINT(181025 333558)
## 3 640 POINT(181165 333537)
## 4 257 POINT(181298 333484)
## 5 269 POINT(181307 333330)
summary(s)
## zinc geom
## Min. : 113.0 POINT :155
## 1st Qu.: 198.0 epsg:NA : 0
## Median : 326.0 +proj=ster...: 0
## Mean : 469.7
## 3rd Qu.: 674.5
## Max. :1839.0
Still to do/to be decided
The following issues need to be decided upon:
- reproject sf objects through
rgdal2
? support well-known-text for CRS? or use PROJ.4 directly? - when subsetting attributes from an
sf
objects, make geometry sticky (like sp does), or drop geometry and returndata.frame
(data.frame behaviour)?
The following things still need to be done:
- write simple features through GDAL (using rgdal2)
- using gdal geometry functions in
rgdal2
- extend rgdal2 to also read
XYZ
,XYM
, andXYZM
geometries - my feeling is that this will be easier than modifyingrgdal
- reprojection of sf objects
- link to GEOS, using GEOS functions: GDAL with GEOS enabled (and
rgdal2
) has some of this, but not for instancergeos::gRelate
- develop better and more complete test cases; also check the OGC test suite
- improve documentation, add tutorial (vignettes, paper)
- add plot functions (base, grid)
- explore direct WKB - sf conversion, without GDAL
- explore how meaningfulness of operations can be verified when for
attributes their
relation_to_geometry
has been specified
Please let me know if you have any comments, suggestions or questions!