Perform an operation on all pairs of rows in a column

前端未结

关注

 2  606

Assume the following DataFrame:

I would like to make a calculation betweeen all rows to all ot

相关标签:

2条回答

梦谈多话

2021-01-21 13:15

IIUC itertools

import itertools

s=list(itertools.combinations(df.index, 2)) 
pd.Series([df.A.loc[x[1]]-df.A.loc[x[0]] for x in s ])
Out[495]: 
0      10
1     200
2    3000
3     190
4    2990
5    2800
dtype: int64

Update

s=list(itertools.combinations(df.index, 2)) 

pd.DataFrame([x+(df.A.loc[x[1]]-df.A.loc[x[0]],) for x in s ])
Out[518]: 
   0  1     2
0  0  1    10
1  0  2   200
2  0  3  3000
3  1  2   190
4  1  3  2990
5  2  3  2800

0 讨论(0)

悲&欢浪女

2021-01-21 13:17

Use broadcasted subtraction, then np.tril_indices to extract the lower diagonal (positive values).

# <= 0.23 
# u = df['A'].values
# 0.24+
u = df['A'].to_numpy()  
u2 = (u[:,None] - u)   

pd.Series(u2[np.tril_indices_from(u2, k=-1)])

0      10
1     200
2     190
3    3000
4    2990
5    2800
dtype: int64

Or, use subtract.outer to avoid the conversion to array beforehand.

u2 = np.subtract.outer(*[df.A]*2)
pd.Series(u2[np.tril_indices_from(u2, k=-1)])

If you need the index as well, use

idx = np.tril_indices_from(u2, k=-1)
pd.DataFrame({
    'val':u2[np.tril_indices_from(u2, k=-1)], 
    'row': idx[0], 
    'col': idx[1]
})

    val  row  col
0    10    1    0
1   200    2    0
2   190    2    1
3  3000    3    0
4  2990    3    1
5  2800    3    2

0 讨论(0)