How to find the closest match based on 2 keys from one dataframe to another?

前端 未结 2 1960
一向
一向 2020-12-17 00:05

I have 2 dataframes I\'m working with. One has a bunch of locations and coordinates (longitude, latitude). The other is a weather data set with data from weather stations al

相关标签:
2条回答
  • 2020-12-17 00:38

    Let's say you have a distance function dist that you want to minimize:

    def dist(lat1, long1, lat2, long2):
        return np.abs((lat1-lat2)+(long1-long2))
    

    For a given location, you can find the nearest station as follows:

    lat = 39.463744
    long = -76.119411
    weather.apply(
        lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
        axis=1)
    

    This will calculate the distance to all weather stations. Using idxmin you can find the closest station name:

    distances = weather.apply(
        lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
        axis=1)
    weather.loc[distances.idxmin(), 'StationName']
    

    Let's put all this in a function:

    def find_station(lat, long):
        distances = weather.apply(
            lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
            axis=1)
        return weather.loc[distances.idxmin(), 'StationName']
    

    You can now get all the nearest stations by applying it to the locations dataframe:

    locations.apply(
        lambda row: find_station(row['Latitude'], row['Longitude']), 
        axis=1)
    

    Output:

    0         WALTHAM
    1         WALTHAM
    2    PORTST.LUCIE
    3         WALTHAM
    4    PORTST.LUCIE
    
    0 讨论(0)
  • 2020-12-17 00:39

    So I appreciate that this is a bit messy, but I used something similar to match genetic data between tables. It relies on the location file longitude and latitude being within 5 of those on the weather file, but these can be changed if need be.

    rows=range(location.shape[0])
    weath_rows = range(weather.shape[0])
    for r in rows:
        lat = location.iloc[r,1]
        max_lat = lat +5
        min_lat = lat -5
        lon = location.iloc[r,2]
        max_lon = lon +5
        min_lon = lon -5
        for w in weath_rows:
            if (min_lat <= weather.iloc[w,2] <= max_lat) and (min_lon <= weather.iloc[w,3] <= max_lon):
                location['Station_Name'] = weather.iloc[w,1]
    
    0 讨论(0)
提交回复
热议问题