I want to calculate the average geographical distance between a number of houses per province.
Suppose I have the following data.
df1 <- data.fram
My 10 cents. You can:
# subset the province
df1 <- df1[which(df1$province==1),]
# get all combinations
all <- combn(df1$house, 2, FUN = NULL, simplify = TRUE)
# run your function and get distances for all combinations
distances <- c()
for(col in 1:ncol(all)) {
a <- all[1, col]
b <- all[2, col]
dist <- distm(c(df1$lon[a], df1$lat[a]), c(df1$lon[b], df1$lat[b]), fun = distHaversine)
distances <- c(distances, dist)
}
# calculate mean:
mean(distances)
# [1] 15379.21
This gives you the mean value for the province, which you can compare with the results of other methods. For example sapply
which was mentioned in the comments:
df1 <- df1[which(df1$province==1),]
mean(sapply(split(df1, df1$province), dist))
# [1] 1.349036
As you can see, it gives different results, cause dist
function can calculate the distances of different type (like euclidean) and cannot do haversine or other "geodesic" distances. The package geodist
seems to have options which could bring you closer to sapply
:
library(geodist)
library(magrittr)
# defining the data
df1 <- data.frame(province = c(1, 1, 1, 2, 2, 2),
house = c(1, 2, 3, 4, 5, 6),
lat = c(-76.6, -76.5, -76.4, -75.4, -80.9, -85.7),
lon = c(39.2, 39.1, 39.3, 60.8, 53.3, 40.2))
# defining the function
give_distance <- function(resultofsplit){
distances <- c()
for (i in 1:length(resultofsplit)){
sdf <- resultofsplit
sdf <- sdf[[i]]
sdf <- sdf[c("lon", "lat", "province", "house")]
sdf2 <- as.matrix(sdf)
sdf3 <- geodist(x=sdf2, measure="haversine")
sdf4 <- unique(as.vector(sdf3))
sdf4 <- sdf4[sdf4 != 0] # this is to remove the 0-distances
mean_dist <- mean(sdf4)
distances <- c(distances, mean_dist)
}
return(distances)
}
split(df1, df1$province) %>% give_distance()
#[1] 15379.21 793612.04
E.g. the function will give you the mean distance values for each province. Now, I didn´t manage to get give_distance
work with sapply
, but this should already be more efficient.