Tidy storm trajectories
<!DOCTYPE html>
The storms dataset in dplyr
I’m not sure whether this is a good time to talk about storm trajectories, but am pretty sure that many have seen some pretty visualisations of them. In R, many packages provide or analyse trajectory data, but how is this done in the tidyverse? This blog post explores some options, I would be happy to improve it after your feedback.
library(tidyverse)
## + ggplot2 2.2.1.9000 Date: 2017-09-11
## + tibble 1.3.4 R: 3.4.1
## + tidyr 0.7.1 OS: Ubuntu 16.04.3 LTS
## + readr 1.1.0 GUI: X11
## + purrr 0.2.3.9000 Locale: en_US.UTF-8
## + dplyr 0.7.2 TZ: Europe/Berlin
## + stringr 1.2.0
## + forcats 0.2.0
## Conflicts -----------------------------------------------------------------
## * filter(), from dplyr, masks stats::filter()
## * lag(), from dplyr, masks stats::lag()
Since version 0.7-0, dplyr comes with a nice spatiotemporal dataset on storm tracks, obtained from the atlantic hurricane database HURDAT2:
storms
## # A tibble: 10,010 x 13
## name year month day hour lat long status category
## <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr> <ord>
## 1 Amy 1975 6 27 0 27.5 -79.0 tropical depression -1
## 2 Amy 1975 6 27 6 28.5 -79.0 tropical depression -1
## 3 Amy 1975 6 27 12 29.5 -79.0 tropical depression -1
## 4 Amy 1975 6 27 18 30.5 -79.0 tropical depression -1
## 5 Amy 1975 6 28 0 31.5 -78.8 tropical depression -1
## 6 Amy 1975 6 28 6 32.4 -78.7 tropical depression -1
## 7 Amy 1975 6 28 12 33.3 -78.0 tropical depression -1
## 8 Amy 1975 6 28 18 34.0 -77.0 tropical depression -1
## 9 Amy 1975 6 29 0 34.4 -75.8 tropical storm 0
## 10 Amy 1975 6 29 6 34.0 -74.8 tropical storm 0
## # ... with 10,000 more rows, and 4 more variables: wind <int>,
## # pressure <int>, ts_diameter <dbl>, hu_diameter <dbl>
the script to import this dataset in R derives new variables and converts some measurement units and is found here; I could get it to work after two rather trivial modifications.
The dataset is given in an unstructured tibble and as is quite typical, longitude and latitude are placed in two columns. They are of course compound values: by knowing one you can do very little without knowing the other. We can bind longitude and latitude together into POINT simple feature geometries, and define the datum (the reference system of the coordinates). This is done by converting the tibble into a simple features tibble:
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.2.0, proj.4 4.9.2, lwgeom 2.3.3 r15473
storms.sf <- storms %>%
st_as_sf(coords = c("long", "lat"), crs = 4326)
class(storms.sf)
## [1] "sf" "tbl_df" "tbl" "data.frame"
We select a few columns only to also show some of the POINTs:
storms.sf %>% select(name, year, geometry)
## Simple feature collection with 10010 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 10,010 x 3
## name year geometry
## <chr> <dbl> <simple_feature>
## 1 Amy 1975 <POINT (-79 2...>
## 2 Amy 1975 <POINT (-79 2...>
## 3 Amy 1975 <POINT (-79 2...>
## 4 Amy 1975 <POINT (-79 3...>
## 5 Amy 1975 <POINT (-78.8...>
## 6 Amy 1975 <POINT (-78.7...>
## 7 Amy 1975 <POINT (-78 3...>
## 8 Amy 1975 <POINT (-77 34)>
## 9 Amy 1975 <POINT (-75.8...>
## 10 Amy 1975 <POINT (-74.8...>
## # ... with 10,000 more rows
We can create proper time values and drop some components now obsolete:
storms.sf <- storms.sf %>%
mutate(time = as.POSIXct(paste(paste(year,month,day, sep = "-"),
paste(hour, ":00", sep = "")))) %>%
select(-month, -day, -hour)
storms.sf %>% select(name, time, geometry)
## Simple feature collection with 10010 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 10,010 x 3
## name time geometry
## <chr> <dttm> <simple_feature>
## 1 Amy 1975-06-27 00:00:00 <POINT (-79 2...>
## 2 Amy 1975-06-27 06:00:00 <POINT (-79 2...>
## 3 Amy 1975-06-27 12:00:00 <POINT (-79 2...>
## 4 Amy 1975-06-27 18:00:00 <POINT (-79 3...>
## 5 Amy 1975-06-28 00:00:00 <POINT (-78.8...>
## 6 Amy 1975-06-28 06:00:00 <POINT (-78.7...>
## 7 Amy 1975-06-28 12:00:00 <POINT (-78 3...>
## 8 Amy 1975-06-28 18:00:00 <POINT (-77 34)>
## 9 Amy 1975-06-29 00:00:00 <POINT (-75.8...>
## 10 Amy 1975-06-29 06:00:00 <POINT (-74.8...>
## # ... with 10,000 more rows
We now have a POINT dataset that, when plotted, still shows unstructured points:
cls <- storms.sf %>% pull(status)
cls <- factor(cls, levels = unique(cls))
col = sf.colors(length(levels(cls)))
plot(st_geometry(storms.sf), cex = .2, axes = TRUE, graticule = TRUE,
col = col[cls])
legend("topleft", legend = levels(cls), col = col, pch = 1)
(note that we’re using base plot, since ggplot is still undeveloped for simple feature points)
Storm summary properties
What we just saw is storm observations, colored by severity. What we lack is the connections between points of individual storms. We can obtain these by grouping points by name and year (names are being re-used), and nesting them:
(storms.nest <- storms.sf %>% group_by(name, year) %>% nest)
## # A tibble: 426 x 3
## name year data
## <chr> <dbl> <list>
## 1 Amy 1975 <tibble [30 x 8]>
## 2 Caroline 1975 <tibble [33 x 8]>
## 3 Doris 1975 <tibble [23 x 8]>
## 4 Belle 1976 <tibble [18 x 8]>
## 5 Gloria 1976 <tibble [34 x 8]>
## 6 Anita 1977 <tibble [20 x 8]>
## 7 Clara 1977 <tibble [24 x 8]>
## 8 Evelyn 1977 <tibble [9 x 8]>
## 9 Amelia 1978 <tibble [6 x 8]>
## 10 Bess 1978 <tibble [13 x 8]>
## # ... with 416 more rows
For each nested data.frame in the data list-column, we can combine the points into a line by mapping this function:
to_line <- function(tr) st_cast(st_combine(tr), "LINESTRING") %>% .[[1]]
to the list-column:
(tracks <- storms.nest %>% pull(data) %>% map(to_line) %>% st_sfc(crs = 4326))
## Geometry set for 426 features
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 5 geometries:
## LINESTRING (-79 27.5, -79 28.5, -79 29.5, -79 3...
## LINESTRING (-69.8 22.4, -71.1 21.9, -72.5 21.6,...
## LINESTRING (-48.9 34.9, -49.1 35.2, -48.9 35.3,...
## LINESTRING (-72.8 26, -73 26.3, -73.4 26, -73.2...
## LINESTRING (-58 23, -58.1 23.7, -58.2 24.3, -58...
and combining these storm-based geometries to the storm-based attributes:
(storms.tr <- storms.nest %>% select(-data) %>% st_sf(geometry = tracks))
## Simple feature collection with 426 features and 2 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 20 features:
## name year geometry
## 1 Amy 1975 LINESTRING (-79 27.5, -79 2...
## 2 Caroline 1975 LINESTRING (-69.8 22.4, -71...
## 3 Doris 1975 LINESTRING (-48.9 34.9, -49...
## 4 Belle 1976 LINESTRING (-72.8 26, -73 2...
## 5 Gloria 1976 LINESTRING (-58 23, -58.1 2...
## 6 Anita 1977 LINESTRING (-88.4 26.9, -88...
## 7 Clara 1977 LINESTRING (-80 32.8, -79 3...
## 8 Evelyn 1977 LINESTRING (-62.9 26.9, -64...
## 9 Amelia 1978 LINESTRING (-97 25.7, -97.4...
## 10 Bess 1978 LINESTRING (-90.4 25.3, -91...
## 11 Cora 1978 LINESTRING (-35 13, -36.2 1...
## 12 Juliet 1978 LINESTRING (-55.6 18, -56.6...
## 13 Ana 1979 LINESTRING (-45 10, -46 10....
## 14 Bob 1979 LINESTRING (-96 22, -95.3 2...
## 15 Claudette 1979 LINESTRING (-46.3 12.5, -48...
## 16 David 1979 LINESTRING (-36.1 11.7, -38...
## 17 Frederic 1979 LINESTRING (-25.5 11, -28 1...
## 18 Gloria 1979 LINESTRING (-21 15.5, -22.5...
## 19 Henri 1979 LINESTRING (-86.8 20.3, -86...
## 20 Bonnie 1980 LINESTRING (-35.5 12.7, -36...
We can now plot tracks by storm:
storms.tr %>%
ggplot(aes(color = name)) + geom_sf() + theme(legend.position="none")
storms.tr %>% ggplot(aes(color = year)) + geom_sf()
An interactive plot is obtained using mapview; click on a track to see name and year:
library(mapview)
## Loading required package: leaflet
mapview(storms.tr, zcol = "year", legend = TRUE)
We could easily have aggregated other quantities that vary over a single storm track (category, wind speed, size) and add those to each track, but what we did loose is these properties at individual storm points.
Variations along a storm track
Let’s try to plot wind speed as it varies along a storm track. For this, we will have to create line segments from each point to the next, and add the attributes to those segments.
A function that creates n-1 LINESTRING geometries from n subsequent points is
to_lines <- function(tr) {
g = st_geometry(tr)
hd = head(g, -1)
tl = tail(g, -1)
map2(hd, tl, function(x,y) st_combine(st_sfc(x, y, crs = 4326))) %>%
map(function(x) st_cast(x, "LINESTRING"))
}
we map this function to each storm track in storms.nest:
trs <- storms.nest %>%
pull(data) %>%
map(to_lines) %>%
unlist(recursive = FALSE) %>%
do.call(c, .)
and combine with the attributes, by first the last attribute record for each track
fn = function(x) head(x, -1) %>% as.data.frame %>% select(-geometry)
storms.nest <- storms.nest %>% mutate(data = map(data, fn))
and then adding the LINESTRING geometries to these attributes, plotting wind speed:
storms.tr2 <- storms.nest %>% unnest %>% st_set_geometry(trs)
nrow(storms.tr2)
## [1] 9584
storms.tr2 %>% ggplot(aes(color = wind)) + geom_sf()