I have 2 dataframes I\'m working with. One has a bunch of locations and coordinates (longitude, latitude). The other is a weather data set with data from weather stations al
Let's say you have a distance function dist
that you want to minimize:
def dist(lat1, long1, lat2, long2):
return np.abs((lat1-lat2)+(long1-long2))
For a given location, you can find the nearest station as follows:
lat = 39.463744
long = -76.119411
weather.apply(
lambda row: dist(lat, long, row['Latitude'], row['Longitude']),
axis=1)
This will calculate the distance to all weather stations. Using idxmin
you can find the closest station name:
distances = weather.apply(
lambda row: dist(lat, long, row['Latitude'], row['Longitude']),
axis=1)
weather.loc[distances.idxmin(), 'StationName']
Let's put all this in a function:
def find_station(lat, long):
distances = weather.apply(
lambda row: dist(lat, long, row['Latitude'], row['Longitude']),
axis=1)
return weather.loc[distances.idxmin(), 'StationName']
You can now get all the nearest stations by applying it to the locations
dataframe:
locations.apply(
lambda row: find_station(row['Latitude'], row['Longitude']),
axis=1)
Output:
0 WALTHAM
1 WALTHAM
2 PORTST.LUCIE
3 WALTHAM
4 PORTST.LUCIE
So I appreciate that this is a bit messy, but I used something similar to match genetic data between tables. It relies on the location file longitude and latitude being within 5 of those on the weather file, but these can be changed if need be.
rows=range(location.shape[0])
weath_rows = range(weather.shape[0])
for r in rows:
lat = location.iloc[r,1]
max_lat = lat +5
min_lat = lat -5
lon = location.iloc[r,2]
max_lon = lon +5
min_lon = lon -5
for w in weath_rows:
if (min_lat <= weather.iloc[w,2] <= max_lat) and (min_lon <= weather.iloc[w,3] <= max_lon):
location['Station_Name'] = weather.iloc[w,1]