find locations within certain lat/lon distance in r

后端 未结 4 1276
没有蜡笔的小新
没有蜡笔的小新 2020-11-30 07:57

I have a gridded dataset, with data available at the following locations:

lon <- seq(-179.75,179.75, by = 0.5)
lat <- seq(-89.75,89.75, by = 0.5)


        
相关标签:
4条回答
  • 2020-11-30 08:09

    Timings:

    Comparing @nicola's and my version gives:

    Unit: milliseconds
    
                   min         lq      mean     median         uq       max neval
    nicola1 184.217002 219.924647 297.60867 299.181854 322.635960 898.52393   100
    floo01   61.341560  72.063197  97.20617  80.247810  93.292233 286.99343   100
    nicola2   3.992343   4.485847   5.44909   4.870101   5.371644  27.25858   100
    

    My original solution: (IMHO nicola's second version is much cleaner and faster.)

    You can do the following (explanation below)

    require(geosphere)
    my_coord <- c(mylon, mylat)
    dd2 <- matrix(FALSE, nrow=length(lon), ncol=length(lat))
    outer_loop_state <- 0
    for(i in 1:length(lon)){
        coods <- cbind(lon[i], lat)
        dd <- as.numeric(distHaversine(my_coord, coods))
        dd2[i, ] <- dd <= 500000
        if(any(dd2[i, ])){
          outer_loop_state <- 1
        } else {
          if(outer_loop_state == 1){
            break
          }
        }
      }
    

    Explanation:

    For the loop i apply the following logic:

    outer_loop_state is initialized with 0. If a row with at least one raster-point inside the circle is found outer_loop_state is set to 1. Once there are no more points within the circle for a given row i break.

    The distm call in @nicola version basically does the same without this trick. So it calculates all rows.

    Code for timings:

    microbenchmark::microbenchmark(
      {allCoords<-cbind(lon,rep(lat,each=length(lon)))
      res<-matrix(distm(cbind(mylon,mylat),allCoords,fun=distHaversine)<=500000,nrow=length(lon))},
      {my_coord <- c(mylon, mylat)
      dd2 <- matrix(FALSE, nrow=length(lon), ncol=length(lat))
      outer_loop_state <- 0
      for(i in 1:length(lon)){
        coods <- cbind(lon[i], lat)
        dd <- as.numeric(distHaversine(my_coord, coods))
        dd2[i, ] <- dd <= 500000
        if(any(dd2[i, ])){
          outer_loop_state <- 1
        } else {
          if(outer_loop_state == 1){
            break
          }
        }
      }},
      {#intitialize the return
        res<-matrix(FALSE,nrow=length(lon),ncol=length(lat))
        #we find the possible value of longitude that can be closer than 500000
        #How? We calculate the distance between us and points with our same lat 
        longood<-which(distm(c(mylon,mylat),cbind(lon,mylat))<500000)
        #Same for latitude
        latgood<-which(distm(c(mylon,mylat),cbind(mylon,lat))<500000)
        #we build the matrix with only those values to exploit the vectorized
        #nature of distm
        allCoords<-cbind(lon[longood],rep(lat[latgood],each=length(longood)))
        res[longood,latgood]<-distm(c(mylon,mylat),allCoords)<=500000}
    )
    
    0 讨论(0)
  • 2020-11-30 08:09

    I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.

    First, load the data:

    mylat <- 47.9625
    mylon <- -87.0431
    
    lon <- seq(-179.75,179.75, by = 0.5)
    lat <- seq(-89.75,89.75, by = 0.5)
    df <- expand.grid(lon = lon, lat = lat)
    

    The function spatialrisk::points_in_circle() calculates the observations within radius from a center point. Note that distances are calculated using the Haversine formula.

    Timings for the spatialrisk approach compared to the @Hugh version:

    spatialrisk::points_in_circle(df, mylon, mylat, radius = 5e5)
    
    Unit: milliseconds
           expr       min        lq      mean    median        uq       max neval cld 
    spatialrisk  3.071897  3.366256  5.224479  4.068124  4.809626  17.24378   100   a 
         hutils 17.507311 20.788525 29.470707 25.061943 31.066139 268.29375   100   b
    

    The result can be easily converted to a matrix.

    Have a look at the excellent answer by @philcolbourn on how to test if a point is inside a circle. See: https://stackoverflow.com/a/7227057/5440749

    0 讨论(0)
  • 2020-11-30 08:14

    The dist* functions of the geosphere package are vectorized, so you only need to prepare better your inputs. Try this:

    #prepare a matrix with coordinates of every position
    allCoords<-cbind(lon,rep(lat,each=length(lon)))
    #call the dist function and put the result in a matrix
    res<-matrix(distm(cbind(mylon,mylat),allCoords,fun=distHaversine)<=500000,nrow=length(lon))
    #check the result
    identical(res,dd2)
    #[1] TRUE
    

    As the @Floo0 answer showed, there is a lot of unnecessary calculations. We can follow another strategy: we first determine the lon and lat range that can be closer than the threshold and then we use only them to calculate the distance:

    #initialize the return
    res<-matrix(FALSE,nrow=length(lon),ncol=length(lat))
    #we find the possible values of longitude that can be closer than 500000
    #How? We calculate the distances between us and points with our same lon 
    longood<-which(distm(c(mylon,mylat),cbind(lon,mylat))<=500000)
    #Same for latitude
    latgood<-which(distm(c(mylon,mylat),cbind(mylon,lat))<=500000)
    #we build the matrix with only those values to exploit the vectorized
    #nature of distm
    allCoords<-cbind(lon[longood],rep(lat[latgood],each=length(longood)))
    res[longood,latgood]<-distm(c(mylon,mylat),allCoords)<=500000
    

    In this way, you calculate just lg+ln+lg*ln (lg and ln are the length of latgood and longood), i.e. 531 distances, opposed to the 259200 with my previous method.

    0 讨论(0)
  • 2020-11-30 08:28

    Just use hutils::haversine_distance(lat, lon, mylat, mylon) < 500 directly.

    If the points are assumed to be a cross join of the given lat and lon, use a cross-join first to obtain them:

    library(data.table)
    library(hutils)
    
    lon <- seq(-179.75,179.75, by = 0.5)
    lat <- seq(-89.75,89.75, by = 0.5)
    
    mylat <- 47.9625
    mylon <- -87.0431
    
    Points <- CJ(lon = lon,
                 lat = lat)
    Points[, dist := haversine_distance(lat, lon, mylat, mylon)]
    Points[, sum(dist < 500)]
    #> [1] 379
    

    Created on 2019-10-24 by the reprex package (v0.3.0)

    It improves on the existing answers by its speed and robustness. In particular, it does not rely on the gridded nature of the data and will work with long vectors of coordinates. Below are timings for 100,000 points

    # A tibble: 2 x 14
      expression         min        mean      median         max `itr/sec`  mem_alloc  n_gc n_itr  total_time
      <chr>         <bch:tm>    <bch:tm>    <bch:tm>    <bch:tm>     <dbl>  <bch:byt> <dbl> <int>    <bch:tm>
    1 nicola2    39891.120ms 39891.120ms 39891.120ms 39891.120ms    0.0251 8808.632MB     0     1 39891.120ms
    2 hutils        15.492ms    15.591ms    15.578ms    15.728ms   64.1       5.722MB     0    33   514.497ms
    
    0 讨论(0)
提交回复
热议问题