问题
I have 1100 station location (latitude and longitude) data and 10000 house location (latitude and longitude) data. Is it possible to calculate the lowest distance between station and house for each house by using R codes? I also want the station that gives the lowest distance for each house. Is it possible?
回答1:
Here's a toy example for finding mass distances between m
points and n
cities. It should translate directly to your station/house problem.
I brought up worldcities, spun the globe (so to speak), and stopped on four cities. I then spun again and stopped at two points. The two counts here are immaterial: if we have 4 and 2 or 1100 and 10000, it should not matter much.
worldcities <- read.csv(header = TRUE, stringsAsFactors = FALSE, text = "
lat,lon
39.7642548,-104.9951942
48.8588377,2.2770206
26.9840891,49.4080842
13.7245601,100.493026")
coords <- read.csv(header = TRUE, stringsAsFactors = FALSE, text = "
lat,lon
27.9519571,66.8681431
40.5351151,-108.4939948")
(A quick note ... often, tools give us coordinates in "latitude, longitude", at least in my experience. geosphere
functions, however, assumes "longitude, latitude". So my coordinates above were copied straight from random views in google maps, and I didn't want to edit them; because of this, I reverse the columns below with [,2:1]
column indexing. If you forget and give coordinates that are undeniably not correct, you'll get the error Error in .pointsToMatrix(p1) : latitude < -90
, which should be a prod that you have likely reversed the order of your coordinates. At which point you scratch your head and wonder if all of your other projects have used the wrong coordinates, calling into question your conclusions. Not me, I've never been there. This year.)
Let's find the distance in meters between each of coords
(each row) and each city (each column):
dists <- outer(seq_len(nrow(coords)), seq_len(nrow(worldcities)),
function(i, j) geosphere::distHaversine(coords[i,2:1], worldcities[j,2:1]))
dists
# [,1] [,2] [,3] [,4]
# [1,] 12452329.0 5895577 1726433 3822220
# [2,] 309802.8 7994185 12181477 13296825
It should be straight-forward to find which city is closest to each coordinate, with
apply(dists, 1, which.min)
# [1] 3 1
That is, the first point is closest to the third city, and the second point is closest to the first city.
Just to prove this is a tenable solution for a large number pairs, here's the same problem scaled up a bit.
worldcities_big <- do.call(rbind, replicate(250, worldcities, simplify = FALSE))
nrow(worldcities_big)
# [1] 1000
coords_big <- do.call(rbind, replicate(5000, coords, simplify = FALSE))
nrow(coords_big)
# [1] 10000
system.time(
dists <- outer(seq_len(nrow(coords_big)), seq_len(nrow(worldcities_big)),
function(i, j) geosphere::distHaversine(coords_big[i,2:1], worldcities_big[j,2:1]))
)
# user system elapsed
# 67.62 2.22 70.03
So yes, it was not instantaneous, but 70 seconds is not horrible for 10,000,000 distance calculations. Could you make it faster? Perhaps, not sure precisely how, easily. I'd think some heuristics might reduce it to O(m*log(n))
from O(m*n)
time, but I don't know if that's worth the coding complexity it'll introduce.
来源:https://stackoverflow.com/questions/60049868/how-can-i-calculate-distance-between-multiple-latitude-and-longitude-data