Euclidean Distance Matrix Using Pandas

前端 未结 3 2101
温柔的废话
温柔的废话 2021-02-09 01:48

I have a .csv file that contains city, latitude and longitude data in the below format:

CITY|LATITUDE|LONGITUDE
A|40.745392|-73.978364
B|42.562786|-114.460503
C|         


        
相关标签:
3条回答
  • 2021-02-09 02:11
    for i in df["CITY"]:
        for j in df["CITY"]:
            row = df[df["CITY"] == j][["LATITUDE", "LONGITUDE"]]
            latitude = row["LATITUDE"].tolist()[0]
            longitude = row["LONGITUDE"].tolist()[0]
            df.loc[df['CITY'] == i, j] = ((df["LATITUDE"] - latitude)**2 + (df["LONGITUDE"] - longitude)**2)**0.5
    
    df = df.drop(["CITY", "LATITUDE", "LONGITUDE"], axis=1)
    

    This works

    0 讨论(0)
  • You can use pdist and squareform methods from scipy.spatial.distance:

    In [12]: df
    Out[12]:
      CITY   LATITUDE   LONGITUDE
    0    A  40.745392  -73.978364
    1    B  42.562786 -114.460503
    2    C  37.227928  -77.401924
    3    D  41.245708  -75.881241
    4    E  41.308273  -72.927887
    
    In [13]: from scipy.spatial.distance import squareform, pdist
    
    In [14]: pd.DataFrame(squareform(pdist(df.iloc[:, 1:])), columns=df.CITY.unique(), index=df.CITY.unique())
    Out[14]:
               A          B          C          D          E
    A   0.000000  40.522913   4.908494   1.967551   1.191779
    B  40.522913   0.000000  37.440606  38.601738  41.551558
    C   4.908494  37.440606   0.000000   4.295932   6.055264
    D   1.967551  38.601738   4.295932   0.000000   2.954017
    E   1.191779  41.551558   6.055264   2.954017   0.000000
    
    0 讨论(0)
  • 2021-02-09 02:13

    the matrix can be directly created with cdist in scipy.spatial.distance:

    from scipy.spatial.distance import cdist
    df_array = df[["LATITUDE", "LONGITUDE"]].to_numpy()
    dist_mat = cdist(df_array, df_array)
    pd.DataFrame(dist_mat, columns = df["CITY"], index = df["CITY"])
    
    0 讨论(0)
提交回复
热议问题