I know that to find the distance between two latitude, longitude points I need to use the haversine function:
def haversine(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
I have a DataFrame where one column is latitude and another column is longitude. I want to find out how far these points are from a set point, -56.7213600, 37.2175900. How do I take the values from the DataFrame and put them into the function?
example DataFrame:
SEAZ LAT LON
1 296.40, 58.7312210, 28.3774110
2 274.72, 56.8148320, 31.2923240
3 192.25, 52.0649880, 35.8018640
4 34.34, 68.8188750, 67.1933670
5 271.05, 56.6699880, 31.6880620
6 131.88, 48.5546220, 49.7827730
7 350.71, 64.7742720, 31.3953780
8 214.44, 53.5192920, 33.8458560
9 1.46, 67.9433740, 38.4842520
10 273.55, 53.3437310, 4.4716664
I can't confirm if the calculations are correct but the following worked:
In [11]:
from numpy import cos, sin, arcsin, sqrt
from math import radians
def haversine(row):
lon1 = -56.7213600
lat1 = 37.2175900
lon2 = row['LON']
lat2 = row['LAT']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
df['distance'] = df.apply(lambda row: haversine(row), axis=1)
df
Out[11]:
SEAZ LAT LON distance
index
1 296.40 58.731221 28.377411 6275.791920
2 274.72 56.814832 31.292324 6509.727368
3 192.25 52.064988 35.801864 6990.144378
4 34.34 68.818875 67.193367 7357.221846
5 271.05 56.669988 31.688062 6538.047542
6 131.88 48.554622 49.782773 8036.968198
7 350.71 64.774272 31.395378 6229.733699
8 214.44 53.519292 33.845856 6801.670843
9 1.46 67.943374 38.484252 6418.754323
10 273.55 53.343731 4.471666 4935.394528
The following code is actually slower on such a small dataframe but I applied it to a 100,000 row df:
In [35]:
%%timeit
df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LON'])
df['dLON'] = df['LON_rad'] - math.radians(-56.7213600)
df['dLAT'] = df['LAT_rad'] - math.radians(37.2175900)
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
1 loops, best of 3: 17.2 ms per loop
Compared to the apply function which took 4.3s so nearly 250 times quicker, something to note in the future
If we compress all the above in to a one-liner:
In [39]:
%timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['LAT']) - math.radians(37.2175900))/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(np.radians(df['LAT'])) * np.sin((np.radians(df['LON']) - math.radians(-56.7213600))/2)**2))
100 loops, best of 3: 12.6 ms per loop
We observe further speed ups now a factor of ~341 times quicker.
来源:https://stackoverflow.com/questions/25767596/vectorised-haversine-formula-with-a-pandas-dataframe