I am working on a spatial problem using the sf
package in conjunction with dplyr
and purrr
.
I would prefer to perform spatial operations inside a mutate
call, like so:
simple_feature %>% mutate(geometry_area = map_dbl(geometry, ~ as.double(st_area(.x))))
I like that this approach allows me to run a series of spatial operations using %>%
and mutate
.
I dislike that this approach seems to significantly increase the run-time of the sf
functions (sometimes prohibitively) and I would appreciate hearing suggestions about how to overcome this speed loss.
Here is a reprex that illustrates the speed loss problem in detail.
Please note: this is not a minimal example and requires downloading a few packages and one file from an ESRI REST API. I hope you'll be kind with me ;)
The objective in this example is to add a new column indicating whether each North Carolina county (nc
) intersects with any of the waterbodies polygons (nc_wtr
), as shown in the image below:
I created a function that performs this calculation: st_intersects_any()
Then I benchmark that function on two datasets (nc
and nc_1e4
), first using st_intersects_any()
by itself and then using it inside a mutate
call.
## |TEST | ELAPSED| ## |:------------------|--------:| ## |bm_sf_small | 0.01| ## |bm_sf_dplyr_small | 1.22| ## |bm_sf_large | 0.95| ## |bm_sf_dplyr_large | 122.88|
The benchmarks clearly show that the dplyr
approach is substantially slower, and I'm hoping that someone has a suggestion for reducing or eliminating this speed loss while still using the dplyr
approach.
If there are significantly faster ways to do this using data.table
or some other method that I should check out please let me know about those as well.
Thanks!
Reprex
# Setup ---- library(lwgeom) # devtools::install_github('r-spatial/lwgeom) library(tidyverse) library(sf) library(esri2sf) # devtools::install_github('yonghah/esri2sf') library(rbenchmark) library(knitr) # Create the new sf function: st_intersects_any ---- st_intersects_any <- function(x, y) { st_intersects(x, y) %>% map_lgl(~ length(.x) > 0) } # Load data ---- # NC counties nc <- read_sf(system.file("shape/nc.shp", package = "sf")) %>% st_transform(32119) nc_1e4 <- list(nc) %>% rep(times = 1e2) %>% reduce(rbind) # NC watersheds url <- "https://services.nconemap.gov/secure/rest/services/NC1Map_Watersheds/MapServer/2" nc_wtr <- esri2sf(url) ## Warning: package 'httr' was built under R version 3.4.2 ## ## Attaching package: 'jsonlite' ## The following object is masked from 'package:purrr': ## ## flatten ## [1] "Feature Layer" ## [1] "esriGeometryPolygon" nc_wtr <- st_transform(nc_wtr, 32119) %>% st_simplify(dTolerance = 100) # simplify the waterbodies geometries # plot the data par(mar = rep(.1, 4)) plot(st_geometry(nc), lwd = 1) plot(st_geometry(nc_wtr), col = alpha("blue", .3), lwd = 1.5, add = TRUE) # Benchmark the two approaches cols <- c("elapsed", "relative") bm_sf_small <- benchmark({ st_intersects_any(nc, nc_wtr) }, columns = cols, replications = 1) bm_sf_dplyr_small <- benchmark({ nc %>% transmute(INT = map_lgl(geometry, st_intersects_any, y = nc_wtr)) }, columns = cols, replications = 1) ## Warning: package 'bindrcpp' was built under R version 3.4.2 bm_sf_large <- benchmark({ st_intersects_any(nc_1e4, nc_wtr) }, columns = cols, replications = 1) bm_sf_dplyr_large <- benchmark({ nc_1e4 %>% transmute(INT = map_lgl(geometry, st_intersects_any, y = nc_wtr)) }, columns = cols, replications = 1) tests <- list(bm_sf_small, bm_sf_dplyr_small, bm_sf_large, bm_sf_dplyr_large) tbl <- tibble( TEST = c("bm_sf_small", "bm_sf_dplyr_small", "bm_sf_large", "bm_sf_dplyr_large"), ELAPSED = map_dbl(tests, "elapsed") ) kable(tbl,format = "markdown", padding = 2) ## |TEST | ELAPSED| ## |:------------------|--------:| ## |bm_sf_small | 0.01| ## |bm_sf_dplyr_small | 1.22| ## |bm_sf_large | 0.95| ## |bm_sf_dplyr_large | 122.88| devtools::session_info() ## Session info ------------------------------------------------------------- ## setting value ## version R version 3.4.0 (2017-04-21) ## system x86_64, mingw32 ## ui RTerm ## language (EN) ## collate English_United States.1252 ## tz America/Los_Angeles ## date 2018-01-31 ## Packages ----------------------------------------------------------------- ## package * version date source ## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.2) ## backports 1.1.0 2017-05-22 CRAN (R 3.4.0) ## base * 3.4.0 2017-04-21 local ## bindr 0.1 2016-11-13 CRAN (R 3.4.2) ## bindrcpp * 0.2 2017-06-17 CRAN (R 3.4.2) ## broom 0.4.3 2017-11-20 CRAN (R 3.4.3) ## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.2) ## class 7.3-14 2015-08-30 CRAN (R 3.4.0) ## classInt 0.1-24 2017-04-16 CRAN (R 3.4.2) ## cli 1.0.0 2017-11-05 CRAN (R 3.4.2) ## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.2) ## compiler 3.4.0 2017-04-21 local ## crayon 1.3.4 2017-10-30 Github (r-lib/crayon@b5221ab) ## curl 3.0 2017-10-06 CRAN (R 3.4.2) ## datasets * 3.4.0 2017-04-21 local ## DBI 0.7 2017-06-18 CRAN (R 3.4.2) ## devtools 1.13.2 2017-06-02 CRAN (R 3.4.0) ## digest 0.6.13 2017-12-14 CRAN (R 3.4.3) ## dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.2) ## e1071 1.6-8 2017-02-02 CRAN (R 3.4.2) ## esri2sf * 0.1.0 2017-12-12 Github (yonghah/esri2sf@81d211f) ## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.3) ## forcats * 0.2.0 2017-01-23 CRAN (R 3.4.3) ## foreign 0.8-67 2016-09-13 CRAN (R 3.4.0) ## ggplot2 * 2.2.1.9000 2017-12-02 Github (tidyverse/ggplot2@7b5c185) ## glue 1.2.0.9000 2018-01-13 Github (tidyverse/glue@1592ee1) ## graphics * 3.4.0 2017-04-21 local ## grDevices * 3.4.0 2017-04-21 local ## grid 3.4.0 2017-04-21 local ## gtable 0.2.0 2016-02-26 CRAN (R 3.4.2) ## haven 1.1.0 2017-07-09 CRAN (R 3.4.2) ## hms 0.4.0 2017-11-23 CRAN (R 3.4.3) ## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0) ## httr * 1.3.1 2017-08-20 CRAN (R 3.4.2) ## jsonlite * 1.5 2017-06-01 CRAN (R 3.4.0) ## knitr 1.18 2017-12-27 CRAN (R 3.4.3) ## lattice 0.20-35 2017-03-25 CRAN (R 3.4.0) ## lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2) ## lubridate 1.7.1 2017-11-03 CRAN (R 3.4.2) ## lwgeom * 0.1-1 2017-12-16 Github (r-spatial/lwgeom@baf22c6) ## magrittr 1.5 2014-11-22 CRAN (R 3.4.0) ## memoise 1.1.0 2017-04-21 CRAN (R 3.4.0) ## methods * 3.4.0 2017-04-21 local ## mnormt 1.5-5 2016-10-15 CRAN (R 3.4.1) ## modelr 0.1.1 2017-07-24 CRAN (R 3.4.2) ## munsell 0.4.3 2016-02-13 CRAN (R 3.4.2) ## nlme 3.1-131 2017-02-06 CRAN (R 3.4.0) ## parallel 3.4.0 2017-04-21 local ## pillar 1.0.99.9001 2018-01-16 Github (r-lib/pillar@9d96835) ## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.2) ## plyr 1.8.4 2016-06-08 CRAN (R 3.4.2) ## psych 1.7.8 2017-09-09 CRAN (R 3.4.2) ## purrr * 0.2.4.9000 2017-12-05 Github (tidyverse/purrr@62b135a) ## R6 2.2.2 2017-06-17 CRAN (R 3.4.0) ## rbenchmark * 1.0.0 2012-08-30 CRAN (R 3.4.1) ## Rcpp 0.12.15 2018-01-20 CRAN (R 3.4.3) ## readr * 1.1.1 2017-05-16 CRAN (R 3.4.2) ## readxl 1.0.0 2017-04-18 CRAN (R 3.4.2) ## reshape2 1.4.2 2016-10-22 CRAN (R 3.4.2) ## rlang 0.1.6 2017-12-21 CRAN (R 3.4.3) ## rmarkdown 1.8 2017-11-17 CRAN (R 3.4.2) ## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3) ## rvest 0.3.2 2016-06-17 CRAN (R 3.4.2) ## scales 0.5.0.9000 2017-12-02 Github (hadley/scales@d767915) ## sf * 0.6-1 2018-01-24 Github (r-spatial/sf@7ea67a5) ## stats * 3.4.0 2017-04-21 local ## stringi 1.1.6 2017-11-17 CRAN (R 3.4.2) ## stringr * 1.2.0 2017-02-18 CRAN (R 3.4.0) ## tibble * 1.4.1.9000 2018-01-18 Github (tidyverse/tibble@64fedbd) ## tidyr * 0.7.2.9000 2018-01-13 Github (tidyverse/tidyr@74bd48f) ## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.3) ## tools 3.4.0 2017-04-21 local ## udunits2 0.13 2016-11-17 CRAN (R 3.4.1) ## units 0.5-1 2018-01-08 CRAN (R 3.4.3) ## utf8 1.1.3 2018-01-03 CRAN (R 3.4.3) ## utils * 3.4.0 2017-04-21 local ## withr 2.1.1.9000 2018-01-13 Github (jimhester/withr@df18523) ## xml2 1.1.1 2017-01-24 CRAN (R 3.4.2) ## yaml 2.1.14 2016-11-12 CRAN (R 3.4.0)