问题
I have a pandas dataframe my_df
with the following columns :
id lat1 lon1 lat2 lon2
1 45 0 41 3
2 40 1 42 4
3 42 2 37 1
Basically, I'd like to do the following :
import haversine
haversine.haversine((45, 0), (41, 3)) # just to show syntax of haversine()
> 507.20410687342115
# what I'd like to do
my_df["dist"] = haversine.haversine((my_df["lat1"], my_df["lon1"]),(my_df["lat2"], my_df["lon2"]))
TypeError: cannot convert the series to < class 'float' >
Using this, I tried the following :
my_df['dist'] = haversine.haversine(
list(zip(*[my_df[['lat1','lon1']][c].values.tolist() for c in my_df[['lat1','lon1']]]))
,
list(zip(*[my_df[['lat2','lon2']][c].values.tolist() for c in my_df[['lat2','lon2']]]))
)
File "blabla\lib\site-packages\haversine__init__.py", line 20, in haversine lat1, lng1 = point1
ValueError: too many values to unpack (expected 2)
Any idea of what I'm doing wrong / how I can achieve what I want ?
回答1:
Use apply
with axis=1
:
my_df["dist"] = my_df.apply(lambda row : haversine.haversine((row["lat1"], row["lon1"]),(row["lat2"], row["lon2"])), axis=1)
To call the haversine function on each row, the function understands scalar values, not array like values hence the error. By calling apply
with axis=1
, you iterate row-wise so we can then access each column value and pass these in the form that the method expects.
Also I don't know what the difference is but there is a vectorised version of the haversine formula
回答2:
What about using a vectorized approach:
import pandas as pd
# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
"""
slightly modified version: of http://stackoverflow.com/a/29546836/2901002
Calculate the great circle distance between two points
on the earth (specified in decimal degrees or in radians)
All (lat, lon) coordinates must have numeric dtypes and be of equal length.
"""
if to_radians:
lat1, lon1, lat2, lon2 = pd.np.radians([lat1, lon1, lat2, lon2])
a = pd.np.sin((lat2-lat1)/2.0)**2 + \
pd.np.cos(lat1) * pd.np.cos(lat2) * pd.np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * pd.np.arcsin(np.sqrt(a))
Demo:
In [38]: df
Out[38]:
id lat1 lon1 lat2 lon2
0 1 45 0 41 3
1 2 40 1 42 4
2 3 42 2 37 1
In [39]: df['dist'] = haversine(df.lat1, df.lon1, df.lat2, df.lon2)
In [40]: df
Out[40]:
id lat1 lon1 lat2 lon2 dist
0 1 45 0 41 3 507.204107
1 2 40 1 42 4 335.876312
2 3 42 2 37 1 562.543582
来源:https://stackoverflow.com/questions/45054067/from-pandas-dataframe-to-tuples-for-haversine-module