R Find the Distance between Two US Zipcode columns

谁说我不能喝 提交于 2020-04-30 07:24:26

问题


I was wondering what the most efficient method of calculating the distance in miles between two US zipcode columns would be using R.

I have heard of the geosphere package for computing the difference between zipcodes but do not fully understand it and was wondering if there were alternative methods as well.

For example say I have a data frame that looks like this.

 ZIP_START     ZIP_END
 95051         98053
 94534         94128
 60193         60666
 94591         73344
 94128         94128
 94015         73344
 94553         94128
 10994         7105
 95008         94128

I want to create a new data frame that looks like this.

 ZIP_START     ZIP_END     MILES_DIFFERENCE
 95051         98053       x
 94534         94128       x
 60193         60666       x
 94591         73344       x
 94128         94128       x
 94015         73344       x
 94553         94128       x
 10994         7105        x
 95008         94128       x

Where x is the difference in miles between the two zipcodes.

What is the best method of calculating this distance?

Here is the R code to create the example data frame.

df <- data.frame("ZIP_START" = c(95051, 94534, 60193, 94591, 94128, 94015, 94553, 10994, 95008), "ZIP_END" = c(98053, 94128, 60666, 73344, 94128, 73344, 94128, 7105, 94128))

Please let me know if you have any questions.

Any advice is appreciated.

Thank you for your help.


回答1:


There is a handy R package out there named "zipcode" which provides a table of zip code, city, state and the latitude and longitude. So once you have that information, the "geosphere" package can calculate the distance between points.

library(zipcode)
library(geosphere)

#dataframe need to be character arrays or the else the leading zeros will be dropped causing errors
df <- data.frame("ZIP_START" = c(95051, 94534, 60193, 94591, 94128, 94015, 94553, 10994, 95008), 
       "ZIP_END" = c(98053, 94128, 60666, 73344, 94128, 73344, 94128, "07105", 94128), 
       stringsAsFactors = FALSE)

data("zipcode")

df$distance_meters<-apply(df, 1, function(x){
  startindex<-which(x[["ZIP_START"]]==zipcode$zip)
  endindex<-which(x[["ZIP_END"]]==zipcode$zip)
  distGeo(p1=c(zipcode[startindex, "longitude"], zipcode[startindex, "latitude"]), p2=c(zipcode[endindex, "longitude"], zipcode[endindex, "latitude"]))
})

Warning about your column class for your input data frame. Zip codes should be a character and not numeric, otherwise leading zeros are dropped causing errors.

The return distance from distGeo is in meters, I will allow the reader to determine the proper unit conversion to miles.



来源:https://stackoverflow.com/questions/55408526/r-find-the-distance-between-two-us-zipcode-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!