# Appendix B — R Basics

This chapter provides some minimal R basics that may make it easier to read this book. A more comprehensive book on R basics is given in Wickham (2014), chapter 2.

## B.1 Pipes

The `|>` (pipe) symbols should be read as then: we read

``a |> b() |> c() |> d(n = 10)``

as with `a` do `b`, then `c`, then `d` with `n` being 10, and that is just alternative syntax for

``d(c(b(a)), n = 10)``

or

``````tmp1 <- b(a)
tmp2 <- c(tmp1)
tmp3 <- d(tmp2, n = 10)``````

To many, the pipe-form is easier to read because execution order follows reading order, from left to right. Like nested function calls, it avoids the need to choose names for intermediate results. As with nested function calls, it is hard to debug intermediate results that diverge from our expectations. Note that the intermediate results do exist in memory, so neither form saves memory allocation. The `|>` native pipe that appeared in R 4.1.0 as used in this book, can be safely substituted by the `%>%` pipe of package magrittr.

## B.2 Data structures

As pointed out by Chambers (2016), everything that exists in R is an object. This includes objects that make things happen, such as language objects or functions, but also the more basic “things”, such as data objects. Some basic R data structures will now be discussed.

### Homogeneous vectors

Data objects contain data, and possibly metadata. Data is always in the form of a vector, which can have different types. We can find the type by `typeof`, and vector length by `length`. Vectors are created by `c`, which combines individual elements:

``````typeof(1:10)
# [1] "integer"
length(1:10)
# [1] 10
typeof(1.0)
# [1] "double"
length(1.0)
# [1] 1
typeof(c("foo", "bar"))
# [1] "character"
length(c("foo", "bar"))
# [1] 2
typeof(c(TRUE, FALSE))
# [1] "logical"``````

Vectors of this kind can only have a single type.

Note that vectors can have zero length:

``````i <- integer(0)
typeof(i)
# [1] "integer"
i
# integer(0)
length(i)
# [1] 0``````

We can retrieve (or in assignments: replace) elements in a vector using `[` or `[[`:

``````a <- c(1,2,3)
a[2]
# [1] 2
a[[2]]
# [1] 2
a[2:3]
# [1] 2 3
a[2:3] <- c(5,6)
a
# [1] 1 5 6
a[[3]] <- 10
a
# [1]  1  5 10``````

where the difference is that `[` can operate on an index range (or multiple indexes), and `[[` operates on a single vector value.

### Heterogeneous vectors: `list`

An additional vector type is the `list`, which can combine any types in its elements:

``````l <- list(3, TRUE, "foo")
typeof(l)
# [1] "list"
length(l)
# [1] 3``````

For lists, there is a further distinction between `[` and `[[`: the single `[` returns always a list, and `[[` returns the contents of a list element:

``````l[1]
# [[1]]
# [1] 3
l[[1]]
# [1] 3``````

For replacement, one case use `[` when providing a list, and `[[` when providing a new value:

``````l[1:2] <- list(4, FALSE)
l
# [[1]]
# [1] 4
#
# [[2]]
# [1] FALSE
#
# [[3]]
# [1] "foo"
l[[3]] <- "bar"
l
# [[1]]
# [1] 4
#
# [[2]]
# [1] FALSE
#
# [[3]]
# [1] "bar"``````

In case list elements are named, as in

``````l <- list(first = 3, second = TRUE, third = "foo")
l
# \$first
# [1] 3
#
# \$second
# [1] TRUE
#
# \$third
# [1] "foo"``````

we can use names as in `l[["second"]]` and this can be abbreviated to

``````l\$second
# [1] TRUE
l\$second <- FALSE
l
# \$first
# [1] 3
#
# \$second
# [1] FALSE
#
# \$third
# [1] "foo"``````

This is convenient, but it also requires name look-up in the names attribute (see below).

### NULL and removing list elements

`NULL` is the null value in R; it is special in the sense that it doesn’t work in simple comparisons:

``````3 == NULL # not FALSE!
# logical(0)
NULL == NULL # not even TRUE!
# logical(0)``````

but has to be treated specially, using `is.null`:

``````is.null(NULL)
# [1] TRUE``````

When we want to remove one or more list elements, we can do so by creating a new list that does not contain the elements that needed removal, as in

``````l <- l[c(1,3)] # remove second, implicitly
l
# \$first
# [1] 3
#
# \$third
# [1] "foo"``````

but we can also assign `NULL` to the element we want to eliminate:

``````l\$second <- NULL
l
# \$first
# [1] 3
#
# \$third
# [1] "foo"``````

### Attributes

We can glue arbitrary metadata objects to data objects, as in

``````a <- 1:3
attr(a, "some_meta_data") = "foo"
a
# [1] 1 2 3
# attr(,"some_meta_data")
# [1] "foo"``````

and this can be retrieved, or replaced by

``````attr(a, "some_meta_data")
# [1] "foo"
attr(a, "some_meta_data") <- "bar"
attr(a, "some_meta_data")
# [1] "bar"``````

In essence, the attribute of an object is a named list, and we can get or set the complete list by

``````attributes(a)
# \$some_meta_data
# [1] "bar"
attributes(a) = list(some_meta_data = "foo")
attributes(a)
# \$some_meta_data
# [1] "foo"``````

A number of attributes are treated specially by R, see `?attributes` for full details. Some of the special attributes will now be explained.

#### Object class and `class` attribute

Every object in R “has a class”, meaning that `class(obj)` returns a character vector with the class of `obj`. Some objects have an implicit class, such as basic vectors

``````class(1:3)
# [1] "integer"
class(c(TRUE, FALSE))
# [1] "logical"
class(c("TRUE", "FALSE"))
# [1] "character"``````

but we can also set the class explicitly, either by using `attr` or by using `class` in the left-hand side of an expression:

``````a <- 1:3
class(a) <- "foo"
a
# [1] 1 2 3
# attr(,"class")
# [1] "foo"
class(a)
# [1] "foo"
attributes(a)
# \$class
# [1] "foo"``````

in which case the newly set class overrides the earlier implicit class. This way, we can add methods for class `foo` by appending the class name to the method name:

``````print.foo <- function(x, ...) {
print(paste("an object of class foo with length", length(x)))
}
print(a)
# [1] "an object of class foo with length 3"``````

Providing such methods is generally intended to create more usable software, but at the same time they may make the objects more opaque. It is sometimes useful to see what an object “is made of” by printing it after the class attribute is removed, as in

``````unclass(a)
# [1] 1 2 3``````

As a more elaborate example, consider the case where a polygon is made using package sf:

``````library(sf) |> suppressPackageStartupMessages()
p <- st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,0))))
p
# POLYGON ((0 0, 1 0, 1 1, 0 0))``````

which prints the well-known-text form; to understand what the data structure is like, we can use

``````unclass(p)
# [[1]]
#      [,1] [,2]
# [1,]    0    0
# [2,]    1    0
# [3,]    1    1
# [4,]    0    0``````

#### The `dim` attribute

The `dim` attribute sets the matrix or array dimensions:

``````a <- 1:8
class(a)
# [1] "integer"
attr(a, "dim") <- c(2,4) # or: dim(a) = c(2,4)
class(a)
# [1] "matrix" "array"
a
#      [,1] [,2] [,3] [,4]
# [1,]    1    3    5    7
# [2,]    2    4    6    8
attr(a, "dim") <- c(2,2,2) # or: dim(a) = c(2,2,2)
class(a)
# [1] "array"
a
# , , 1
#
#      [,1] [,2]
# [1,]    1    3
# [2,]    2    4
#
# , , 2
#
#      [,1] [,2]
# [1,]    5    7
# [2,]    6    8``````

### The `names` attributes

Named vectors carry their names in a `names` attribute. We saw examples for lists above, an example for a numeric vector is:

``````a <- c(first = 3, second = 4, last = 5)
a["second"]
# second
#      4
attributes(a)
# \$names
# [1] "first"  "second" "last"``````

Other name attributes include `dimnames` for `matrix` or `array`, which not only names dimensions but also the labels associated values of each of the dimensions:

``````a <- matrix(1:4, 2, 2)
dimnames(a) <- list(rows = c("row1", "row2"),
cols = c("col1", "col2"))
a
#       cols
# rows   col1 col2
#   row1    1    3
#   row2    2    4
attributes(a)
# \$dim
# [1] 2 2
#
# \$dimnames
# \$dimnames\$rows
# [1] "row1" "row2"
#
# \$dimnames\$cols
# [1] "col1" "col2"``````

Data.frame objects have rows and columns, and each has names:

``````df <- data.frame(a = 1:3, b = c(TRUE, FALSE, TRUE))
attributes(df)
# \$names
# [1] "a" "b"
#
# \$class
# [1] "data.frame"
#
# \$row.names
# [1] 1 2 3``````

### Using `structure`

When programming, the pattern of adding or modifying attributes before returning an object is extremely common, an example being:

``````f <- function(x) {
a <- create_obj(x) # call some other function
attributes(a) <- list(class = "foo", meta = 33)
a
}``````

The last two statements can be contracted in

``````f <- function(x) {
a <- create_obj(x) # call some other function
structure(a, class = "foo", meta = 33)
}``````

where function `structure` adds, replaces, or (in case of value `NULL`) removes attributes from the object in its first argument.

## B.3 Dissecting a `MULTIPOLYGON`

We can use the above examples to dissect an `sf` object with `MULTIPOLYGON`s into pieces. Suppose we use the `nc` dataset,

``````system.file("gpkg/nc.gpkg", package = "sf") %>%

we can see from the attributes of `nc`,

``````attributes(nc)
# \$names
#  [1] "AREA"      "PERIMETER" "CNTY_"     "CNTY_ID"   "NAME"
#  [6] "FIPS"      "FIPSNO"    "CRESS_ID"  "BIR74"     "SID74"
# [11] "NWBIR74"   "BIR79"     "SID79"     "NWBIR79"   "geom"
#
# \$row.names
#   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
#  [16]  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
#  [31]  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45
#  [46]  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
#  [61]  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75
#  [76]  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#  [91]  91  92  93  94  95  96  97  98  99 100
#
# \$class
# [1] "sf"         "tbl_df"     "tbl"        "data.frame"
#
# \$sf_column
# [1] "geom"
#
# \$agr
#      AREA PERIMETER     CNTY_   CNTY_ID      NAME      FIPS
#      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#    FIPSNO  CRESS_ID     BIR74     SID74   NWBIR74     BIR79
#      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#     SID79   NWBIR79
#      <NA>      <NA>
# Levels: constant aggregate identity``````

that the geometry column is named `geom`. When we take out this column,

``````nc\$geom
# Geometry set for 100 features
# Geometry type: MULTIPOLYGON
# Dimension:     XY
# Bounding box:  xmin: -84.3 ymin: 33.9 xmax: -75.5 ymax: 36.6
# First 5 geometries:
# MULTIPOLYGON (((-81.5 36.2, -81.5 36.3, -81.6 3...
# MULTIPOLYGON (((-81.2 36.4, -81.2 36.4, -81.3 3...
# MULTIPOLYGON (((-80.5 36.2, -80.5 36.3, -80.5 3...
# MULTIPOLYGON (((-76 36.3, -76 36.3, -76 36.3, -...
# MULTIPOLYGON (((-77.2 36.2, -77.2 36.2, -77.3 3...``````

we see an object that has the following attributes

``````attributes(nc\$geom)
# \$n_empty
# [1] 0
#
# \$crs
# Coordinate Reference System:
#   wkt:
#     DATUM["North American Datum 1927",
#         ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
#             LENGTHUNIT["metre",1]]],
#     PRIMEM["Greenwich",0,
#         ANGLEUNIT["degree",0.0174532925199433]],
#     CS[ellipsoidal,2],
#         AXIS["geodetic latitude (Lat)",north,
#             ORDER[1],
#             ANGLEUNIT["degree",0.0174532925199433]],
#         AXIS["geodetic longitude (Lon)",east,
#             ORDER[2],
#             ANGLEUNIT["degree",0.0174532925199433]],
#     USAGE[
#         SCOPE["Geodesy."],
#         AREA["North and central America: Antigua and Barbuda - onshore. Bahamas - onshore plus offshore over internal continental shelf only. Belize - onshore. British Virgin Islands - onshore. Canada onshore - Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan and Yukon - plus offshore east coast. Cuba - onshore and offshore. El Salvador - onshore. Guatemala - onshore. Honduras - onshore. Panama - onshore. Puerto Rico - onshore. Mexico - onshore plus offshore east coast. Nicaragua - onshore. United States (USA) onshore and offshore - Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming - plus offshore . US Virgin Islands - onshore."],
#         BBOX[7.15,167.65,83.17,-47.74]],
#     ID["EPSG",4267]]
#
# \$class
# [1] "sfc_MULTIPOLYGON" "sfc"
#
# \$precision
# [1] 0
#
# \$bbox
#  xmin  ymin  xmax  ymax
# -84.3  33.9 -75.5  36.6``````

When we take the contents of the fourth list element, we obtain

``````nc\$geom[[4]] |> format(width = 60, digits = 5)
# [1] "MULTIPOLYGON (((-76.009 36.32, -76.017 36.338, -76.033 36..."``````

which is a (classed) list,

``````typeof(nc\$geom[[4]])
# [1] "list"``````

with attributes

``````attributes(nc\$geom[[4]])
# \$class
# [1] "XY"           "MULTIPOLYGON" "sfg"``````

and length

``````length(nc\$geom[[4]])
# [1] 3``````

The length indicates the number of outer rings: a multi-polygon can consist of more than one polygon. We see that most counties only have a single polygon:

``````lengths(nc\$geom)
#   [1] 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#  [32] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1
#  [63] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1
#  [94] 1 2 1 1 1 1 1``````

A multi-polygon is a list with polygons,

``````typeof(nc\$geom[[4]])
# [1] "list"``````

and the first polygon of the fourth multi-polygon is again a list, because polygons have an outer ring possibly followed by multiple inner rings (holes)

``````typeof(nc\$geom[[4]][[1]])
# [1] "list"``````

we see that it contains only one ring, the exterior ring:

``````length(nc\$geom[[4]][[1]])
# [1] 1``````

and we can print type, the dimension and the first set of coordinates by

``````typeof(nc\$geom[[4]][[1]][[1]])
# [1] "double"
dim(nc\$geom[[4]][[1]][[1]])
# [1] 26  2
``nc\$geom[[4]][[1]][[1]][3,2] <- 36.5``