Geographical distance by group - Applying a function on each pair of rows

前端未结

关注

 7  843

清歌不尽 2020-12-21 05:04

I want to calculate the average geographical distance between a number of houses per province.

Suppose I have the following data.

df1 <- data.fram


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   隐瞒了意图╮
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 05:40
              

            
            
                        
My 10 cents. You can: 

# subset the province
df1 <- df1[which(df1$province==1),]

# get all combinations
all <- combn(df1$house, 2, FUN = NULL, simplify = TRUE)

# run your function and get distances for all combinations
distances <- c()
for(col in 1:ncol(all)) {
  a <- all[1, col]
  b <- all[2, col]
  dist <- distm(c(df1$lon[a], df1$lat[a]), c(df1$lon[b], df1$lat[b]), fun = distHaversine)
  distances <- c(distances, dist)
  }

# calculate mean:
mean(distances)
# [1] 15379.21


This gives you the mean value for the province, which you can compare with the results of other methods. For example sapply which was mentioned in the comments:

df1 <- df1[which(df1$province==1),]
mean(sapply(split(df1, df1$province), dist))
# [1] 1.349036


As you can see, it gives different results, cause dist function can calculate the distances of different type (like euclidean) and cannot do haversine or other "geodesic" distances. The package geodist seems to have options which could bring you closer to sapply:

library(geodist)
library(magrittr)

# defining the data
df1 <- data.frame(province = c(1, 1, 1, 2, 2, 2),
                  house = c(1, 2, 3, 4, 5, 6),
                  lat = c(-76.6, -76.5, -76.4, -75.4, -80.9, -85.7), 
                  lon = c(39.2, 39.1, 39.3, 60.8, 53.3, 40.2))

# defining the function 
give_distance <- function(resultofsplit){
  distances <- c()
  for (i in 1:length(resultofsplit)){
    sdf <- resultofsplit
    sdf <- sdf[[i]]
    sdf <- sdf[c("lon", "lat", "province", "house")]

    sdf2 <- as.matrix(sdf)
    sdf3 <- geodist(x=sdf2, measure="haversine")
    sdf4 <- unique(as.vector(sdf3))
    sdf4 <- sdf4[sdf4 != 0]        # this is to remove the 0-distances 
    mean_dist <- mean(sdf4)
    distances <- c(distances, mean_dist)
    }  
    return(distances)
}

split(df1, df1$province) %>% give_distance()
#[1]  15379.21 793612.04


E.g. the function will give you the mean distance values for each province. Now, I didn´t manage to get give_distance work with sapply, but this should already be more efficient. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复