How to apply euclidean distance function to a groupby object in pandas dataframe?

后端 未结 4 475
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-20 06:03

I have a set of objects and their positions over time. I would like to get the average distance between objects for each time point. An example dataframe is as follows:

4条回答
  •  野的像风
    2021-01-20 06:46

    building this up from the first principles:

    For each point at index n, it is necessary to compute the distance with all the points with index > n.

    if the distance between two points is given by formula:

    np.sqrt((x0 - x1)**2 + (y0 - y1)**2)
    

    then for an array of points in a dataframe, we can get all the distances & then calculate its mean:

    distances = []
    for i in range(len(df)-1):
        distances += np.sqrt( (df.x[i+1:] - df.x[i])**2 + (df.y[i+1:] - df.y[i])**2 ).tolist()
    
    np.mean(distances)
    

    expressing the same logic using pd.concat & a couple of helper functions

    def diff_sq(x, i):
        return (x.iloc[i+1:] - x.iloc[i])**2
    
    def dist_df(x, y, i):
        d_sq = diff_sq(x, i) + diff_sq(y, i)
        return np.sqrt(d_sq)
    
    def avg_dist(df):
        return pd.concat([dist_df(df.x, df.y, i) for i in range(len(df)-1)]).mean()
    

    then it is possible to use the avg_dist function with groupby

    df.groupby('time').apply(avg_dist)
    # outputs:
    time
    0     1.550094
    1    10.049876
    2    53.037722
    dtype: float64
    

提交回复
热议问题