pandas - apply function to current row against all other rows

问题

I am utilizing pandas to create a dataframe that appears as follows:

ratings = pandas.DataFrame({
    'article_a':[1,1,0,0],
    'article_b':[1,0,0,0],
    'article_c':[1,0,0,0],
    'article_d':[0,0,0,1],
    'article_e':[0,0,0,1]
},index=['Alice','Bob','Carol','Dave'])

I would like to compute another matrix from this input one that will compare each row against all other rows. Let's assume for example the computation was a function to find the length of the intersection set, I'd like an output DataFrame with the len(intersection(Alice,Bob)), len(intersection(Alice,Carol)), len(intersection(Alice,Dave)) in the first row, with each row following that format against the others. Using this example input, the output matrix would be 4x3:

len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave))
len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave))
len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave))
len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol))

Is there a named method for this kind of function based computation in pandas? What would be the most efficient way to accomplish this?

回答1:

I am not aware of a named method, but I have a one-liner.

In [21]: ratings.apply(lambda row: ratings.apply(
... lambda x: np.equal(row, x), 1).sum(1), 1)
Out[21]: 
       Alice  Bob  Carol  Dave
Alice      5    3      2     0
Bob        3    5      4     2
Carol      2    4      5     3
Dave       0    2      3     5

回答2:

@Dan Allan solution is 'right', here's a slightly different way of approaching the problem

In [26]: ratings
Out[26]: 
       article_a  article_b  article_c  article_d  article_e
Alice          1          1          1          0          0
Bob            1          0          0          0          0
Carol          0          0          0          0          0
Dave           0          0          0          1          1

In [27]: ratings.apply(lambda x: (ratings.T.sub(x,'index')).sum(),1)
Out[27]: 
       Alice  Bob  Carol  Dave
Alice      0   -2     -3    -1
Bob        2    0     -1     1
Carol      3    1      0     2
Dave       1   -1     -2     0

来源：https://stackoverflow.com/questions/16924421/pandas-apply-function-to-current-row-against-all-other-rows

标签

matrix

pandas