Make a SpatialPointsDataFrame with sf the fast way

问题

The task I'm trying to do is very simple with the sp package in R but I'm trying to learn sf hence my question. I'm trying to create a shape of points in R. I have lots of points so it has to be efficient. I've succeeded doing it in both sp and sf but the sf method is slow. Being new to sf, I have a feeling I'm not doing it the most efficient way.

I've made 3 different functions which do the same thing:

1) 100% sp

f_rgdal <- function(dat) {
  coordinates(dat) <- ~x+y
}

2) 100% sf (probably bad...)

f_sf <- function(dat) {
  dat <- st_sfc(
    lapply(
      apply(dat[,c("x", "y")], 1, list), function(xx) st_point(xx[[1]])
      )
    )
}

3) mix of both:

f_rgdal_sp <- function(dat) {
  coordinates(dat) <- ~x+y
  dat <- as(dat, "sf")
}

If I benchmark them, you can see that both function 2 and 3 are way slower than function 1:

set.seed(1234)
dd <- data.frame(x = runif(nb_pt, 0, 100),
                 y = runif(nb_pt, 0,50),
                 f1 = rnorm(nb_pt))

library(sp)
library(sf)
library(rbenchmark)
benchmark(f_rgdal(dd), f_sf(dd), f_rgdal_sp(dd), columns = c("test", "elapsed"))
            test elapsed
1    f_rgdal(dd)    0.22
3 f_rgdal_sp(dd)    4.82
2       f_sf(dd)    4.08

Is their a way to speed up sf? At the end, I want to use st_write which is faster then writeOGR so staying in sp is not ideal.

回答1:

A more "compact" alternative can be:

library(sf)
set.seed(1234)
nb_pt <- 10000
dd <- data.frame(x = runif(nb_pt, 0, 100),
                 y = runif(nb_pt, 0,50),
                 f1 = rnorm(nb_pt))

sf <- sf::st_as_sf(dd, coords = c("x","y"))
sf

#> Simple feature collection with 10000 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0.03418126 ymin: 0.02131674 xmax: 99.95938 ymax: 49.99873
#> epsg (SRID):    NA
#> proj4string:    NA
#> First 10 features:
#>             f1                       geometry
#> 1  -1.81689753 POINT (11.3703411305323 10....
#> 2   0.62716684 POINT (62.2299404814839 24....
#> 3   0.51809210 POINT (60.9274732880294 31....
#> 4   0.14092183 POINT (62.3379441676661 47....
#> 5   1.45727195 POINT (86.0915383556858 8.9...
#> 6  -0.49359652 POINT (64.0310605289415 14....
#> 7  -2.12224406 POINT (0.94957563560456 19....
#> 8  -0.13356660 POINT (23.2550506014377 3.8...
#> 9  -0.42760035 POINT (66.6083758231252 14....
#> 10  0.08779481 POINT (51.4251141343266 23....

It has the advantage of retaining the data attributes, and appears to be faster for larger datasets:

library(microbenchmark)

microbenchmark::microbenchmark(
  st_cast = st_cast(st_sfc(st_multipoint(as.matrix(dd[,1:2]))), "POINT"),
  st_asf = sf::st_as_sf(dd, coords = c("x","y"))
)
#> Unit: milliseconds
#>     expr      min       lq     mean   median       uq      max neval
#>  st_cast 208.6751 256.8995 294.2232 284.2213 316.1777 454.6856   100
#>   st_asf 157.1974 176.6357 207.9863 200.1610 226.1047 323.5700   100

回答2:

library(sf)
library(microbenchmark)

set.seed(1234)

nb_pt <- 100

dd <- data.frame(x = runif(nb_pt, 0, 100),
                 y = runif(nb_pt, 0,50),
                 f1 = rnorm(nb_pt))

print(st_cast(st_sfc(st_multipoint(as.matrix(dd[,1:2]))), "POINT"))
## Geometry set for 100 features 
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 0.9495756 ymin: 1.110341 xmax: 99.21504 ymax: 49.93704
## epsg (SRID):    NA
## proj4string:    NA
## First 5 geometries:
## POINT (11.3703411305323 1.77283635130152)
## POINT (62.2299404814839 28.253805602435)
## POINT (60.9274732880294 14.0128888073377)
## POINT (62.3379441676661 10.2098158211447)
## POINT (86.0915383556858 6.68694493360817)

microbenchmark(
  sf=st_cast(st_sfc(st_multipoint(as.matrix(dd[,1:2]))), "POINT")
)
## Unit: milliseconds
##  expr      min       lq     mean median       uq      max neval
##    sf 1.834133 1.960914 2.608143 2.0314 2.280842 39.04158   100

来源：https://stackoverflow.com/questions/48152269/make-a-spatialpointsdataframe-with-sf-the-fast-way

标签