<!DOCTYPE html>

view raw Rmd

The storms dataset in dplyr

I’m not sure whether this is a good time to talk about storm trajectories, but am pretty sure that many have seen some pretty visualisations of them. In R, many packages provide or analyse trajectory data, but how is this done in the tidyverse? This blog post explores some options, I would be happy to improve it after your feedback.

library(tidyverse)
## + ggplot2 2.2.1.9000        Date: 2017-09-11
## + tibble  1.3.4                R: 3.4.1
## + tidyr   0.7.1               OS: Ubuntu 16.04.3 LTS
## + readr   1.1.0              GUI: X11
## + purrr   0.2.3.9000      Locale: en_US.UTF-8
## + dplyr   0.7.2               TZ: Europe/Berlin
## + stringr 1.2.0           
## + forcats 0.2.0
## Conflicts -----------------------------------------------------------------
## * filter(),  from dplyr, masks stats::filter()
## * lag(),     from dplyr, masks stats::lag()

Since version 0.7-0, dplyr comes with a nice spatiotemporal dataset on storm tracks, obtained from the atlantic hurricane database HURDAT2:

storms
## # A tibble: 10,010 x 13
##     name  year month   day  hour   lat  long              status category
##    <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>               <chr>    <ord>
##  1   Amy  1975     6    27     0  27.5 -79.0 tropical depression       -1
##  2   Amy  1975     6    27     6  28.5 -79.0 tropical depression       -1
##  3   Amy  1975     6    27    12  29.5 -79.0 tropical depression       -1
##  4   Amy  1975     6    27    18  30.5 -79.0 tropical depression       -1
##  5   Amy  1975     6    28     0  31.5 -78.8 tropical depression       -1
##  6   Amy  1975     6    28     6  32.4 -78.7 tropical depression       -1
##  7   Amy  1975     6    28    12  33.3 -78.0 tropical depression       -1
##  8   Amy  1975     6    28    18  34.0 -77.0 tropical depression       -1
##  9   Amy  1975     6    29     0  34.4 -75.8      tropical storm        0
## 10   Amy  1975     6    29     6  34.0 -74.8      tropical storm        0
## # ... with 10,000 more rows, and 4 more variables: wind <int>,
## #   pressure <int>, ts_diameter <dbl>, hu_diameter <dbl>

the script to import this dataset in R derives new variables and converts some measurement units and is found here; I could get it to work after two rather trivial modifications.

The dataset is given in an unstructured tibble and as is quite typical, longitude and latitude are placed in two columns. They are of course compound values: by knowing one you can do very little without knowing the other. We can bind longitude and latitude together into POINT simple feature geometries, and define the datum (the reference system of the coordinates). This is done by converting the tibble into a simple features tibble:

library(sf)
## Linking to GEOS 3.5.1, GDAL 2.2.0, proj.4 4.9.2, lwgeom 2.3.3 r15473
storms.sf <- storms %>% 
    st_as_sf(coords = c("long", "lat"), crs = 4326) 
class(storms.sf)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"

We select a few columns only to also show some of the POINTs:

storms.sf %>% select(name, year, geometry)
## Simple feature collection with 10010 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 10,010 x 3
##     name  year          geometry
##    <chr> <dbl>  <simple_feature>
##  1   Amy  1975 <POINT (-79 2...>
##  2   Amy  1975 <POINT (-79 2...>
##  3   Amy  1975 <POINT (-79 2...>
##  4   Amy  1975 <POINT (-79 3...>
##  5   Amy  1975 <POINT (-78.8...>
##  6   Amy  1975 <POINT (-78.7...>
##  7   Amy  1975 <POINT (-78 3...>
##  8   Amy  1975  <POINT (-77 34)>
##  9   Amy  1975 <POINT (-75.8...>
## 10   Amy  1975 <POINT (-74.8...>
## # ... with 10,000 more rows

We can create proper time values and drop some components now obsolete:

storms.sf <- storms.sf %>% 
    mutate(time = as.POSIXct(paste(paste(year,month,day, sep = "-"), 
                                   paste(hour, ":00", sep = "")))) %>% 
    select(-month, -day, -hour)
storms.sf %>% select(name, time, geometry)
## Simple feature collection with 10010 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 10,010 x 3
##     name                time          geometry
##    <chr>              <dttm>  <simple_feature>
##  1   Amy 1975-06-27 00:00:00 <POINT (-79 2...>
##  2   Amy 1975-06-27 06:00:00 <POINT (-79 2...>
##  3   Amy 1975-06-27 12:00:00 <POINT (-79 2...>
##  4   Amy 1975-06-27 18:00:00 <POINT (-79 3...>
##  5   Amy 1975-06-28 00:00:00 <POINT (-78.8...>
##  6   Amy 1975-06-28 06:00:00 <POINT (-78.7...>
##  7   Amy 1975-06-28 12:00:00 <POINT (-78 3...>
##  8   Amy 1975-06-28 18:00:00  <POINT (-77 34)>
##  9   Amy 1975-06-29 00:00:00 <POINT (-75.8...>
## 10   Amy 1975-06-29 06:00:00 <POINT (-74.8...>
## # ... with 10,000 more rows

We now have a POINT dataset that, when plotted, still shows unstructured points:

cls <- storms.sf %>% pull(status)
cls <- factor(cls, levels = unique(cls))
col = sf.colors(length(levels(cls)))
plot(st_geometry(storms.sf), cex = .2, axes = TRUE, graticule = TRUE, 
     col = col[cls])
legend("topleft", legend = levels(cls), col = col, pch = 1)

(note that we’re using base plot, since ggplot is still undeveloped for simple feature points)

Storm summary properties

What we just saw is storm observations, colored by severity. What we lack is the connections between points of individual storms. We can obtain these by grouping points by name and year (names are being re-used), and nesting them:

(storms.nest <- storms.sf %>% group_by(name, year) %>% nest)
## # A tibble: 426 x 3
##        name  year              data
##       <chr> <dbl>            <list>
##  1      Amy  1975 <tibble [30 x 8]>
##  2 Caroline  1975 <tibble [33 x 8]>
##  3    Doris  1975 <tibble [23 x 8]>
##  4    Belle  1976 <tibble [18 x 8]>
##  5   Gloria  1976 <tibble [34 x 8]>
##  6    Anita  1977 <tibble [20 x 8]>
##  7    Clara  1977 <tibble [24 x 8]>
##  8   Evelyn  1977  <tibble [9 x 8]>
##  9   Amelia  1978  <tibble [6 x 8]>
## 10     Bess  1978 <tibble [13 x 8]>
## # ... with 416 more rows

For each nested data.frame in the data list-column, we can combine the points into a line by mapping this function:

to_line <- function(tr) st_cast(st_combine(tr), "LINESTRING") %>% .[[1]] 

to the list-column:

(tracks <- storms.nest %>% pull(data) %>% map(to_line) %>% st_sfc(crs = 4326))
## Geometry set for 426 features 
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 5 geometries:
## LINESTRING (-79 27.5, -79 28.5, -79 29.5, -79 3...
## LINESTRING (-69.8 22.4, -71.1 21.9, -72.5 21.6,...
## LINESTRING (-48.9 34.9, -49.1 35.2, -48.9 35.3,...
## LINESTRING (-72.8 26, -73 26.3, -73.4 26, -73.2...
## LINESTRING (-58 23, -58.1 23.7, -58.2 24.3, -58...

and combining these storm-based geometries to the storm-based attributes:

(storms.tr <- storms.nest %>% select(-data) %>% st_sf(geometry = tracks))
## Simple feature collection with 426 features and 2 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: -109.3 ymin: 7.2 xmax: -6 ymax: 51.9
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 20 features:
##         name year                       geometry
## 1        Amy 1975 LINESTRING (-79 27.5, -79 2...
## 2   Caroline 1975 LINESTRING (-69.8 22.4, -71...
## 3      Doris 1975 LINESTRING (-48.9 34.9, -49...
## 4      Belle 1976 LINESTRING (-72.8 26, -73 2...
## 5     Gloria 1976 LINESTRING (-58 23, -58.1 2...
## 6      Anita 1977 LINESTRING (-88.4 26.9, -88...
## 7      Clara 1977 LINESTRING (-80 32.8, -79 3...
## 8     Evelyn 1977 LINESTRING (-62.9 26.9, -64...
## 9     Amelia 1978 LINESTRING (-97 25.7, -97.4...
## 10      Bess 1978 LINESTRING (-90.4 25.3, -91...
## 11      Cora 1978 LINESTRING (-35 13, -36.2 1...
## 12    Juliet 1978 LINESTRING (-55.6 18, -56.6...
## 13       Ana 1979 LINESTRING (-45 10, -46 10....
## 14       Bob 1979 LINESTRING (-96 22, -95.3 2...
## 15 Claudette 1979 LINESTRING (-46.3 12.5, -48...
## 16     David 1979 LINESTRING (-36.1 11.7, -38...
## 17  Frederic 1979 LINESTRING (-25.5 11, -28 1...
## 18    Gloria 1979 LINESTRING (-21 15.5, -22.5...
## 19     Henri 1979 LINESTRING (-86.8 20.3, -86...
## 20    Bonnie 1980 LINESTRING (-35.5 12.7, -36...

We can now plot tracks by storm:

storms.tr %>% 
    ggplot(aes(color = name)) + geom_sf() + theme(legend.position="none")

storms.tr %>% ggplot(aes(color = year)) + geom_sf()

An interactive plot is obtained using mapview; click on a track to see name and year:

library(mapview)
## Loading required package: leaflet
mapview(storms.tr, zcol = "year", legend = TRUE)

We could easily have aggregated other quantities that vary over a single storm track (category, wind speed, size) and add those to each track, but what we did loose is these properties at individual storm points.

Variations along a storm track

Let’s try to plot wind speed as it varies along a storm track. For this, we will have to create line segments from each point to the next, and add the attributes to those segments.

A function that creates n-1 LINESTRING geometries from n subsequent points is

to_lines <- function(tr) { 
    g = st_geometry(tr)
    hd = head(g, -1)
    tl = tail(g, -1)
    map2(hd, tl, function(x,y) st_combine(st_sfc(x, y, crs = 4326))) %>% 
        map(function(x) st_cast(x, "LINESTRING"))
}

we map this function to each storm track in storms.nest:

trs <- storms.nest %>% 
    pull(data) %>% 
    map(to_lines) %>% 
    unlist(recursive = FALSE) %>% 
    do.call(c, .)

and combine with the attributes, by first the last attribute record for each track

fn = function(x) head(x, -1) %>% as.data.frame %>% select(-geometry)
storms.nest <- storms.nest %>% mutate(data = map(data, fn))

and then adding the LINESTRING geometries to these attributes, plotting wind speed:

storms.tr2 <- storms.nest %>% unnest %>% st_set_geometry(trs)
nrow(storms.tr2)
## [1] 9584
storms.tr2 %>% ggplot(aes(color = wind)) + geom_sf()