问题
I am utilizing pandas to create a dataframe that appears as follows:
ratings = pandas.DataFrame({
'article_a':[1,1,0,0],
'article_b':[1,0,0,0],
'article_c':[1,0,0,0],
'article_d':[0,0,0,1],
'article_e':[0,0,0,1]
},index=['Alice','Bob','Carol','Dave'])
I would like to compute another matrix from this input one that will compare each row against all other rows. Let's assume for example the computation was a function to find the length of the intersection set, I'd like an output DataFrame with the len(intersection(Alice,Bob))
, len(intersection(Alice,Carol))
, len(intersection(Alice,Dave))
in the first row, with each row following that format against the others. Using this example input, the output matrix would be 4x3:
len(intersection(Alice,Bob)),len(intersection(Alice,Carol)),len(intersection(Alice,Dave))
len(intersection(Bob,Alice)),len(intersection(Bob,Carol)),len(intersection(Bob,Dave))
len(intersection(Carol,Alice)),len(intersection(Carol,Bob)),len(intersection(Carol,Dave))
len(intersection(Dave,Alice)),len(intersection(Dave,Bob)),len(intersection(Dave,Carol))
Is there a named method for this kind of function based computation in pandas? What would be the most efficient way to accomplish this?
回答1:
I am not aware of a named method, but I have a one-liner.
In [21]: ratings.apply(lambda row: ratings.apply(
... lambda x: np.equal(row, x), 1).sum(1), 1)
Out[21]:
Alice Bob Carol Dave
Alice 5 3 2 0
Bob 3 5 4 2
Carol 2 4 5 3
Dave 0 2 3 5
回答2:
@Dan Allan solution is 'right', here's a slightly different way of approaching the problem
In [26]: ratings
Out[26]:
article_a article_b article_c article_d article_e
Alice 1 1 1 0 0
Bob 1 0 0 0 0
Carol 0 0 0 0 0
Dave 0 0 0 1 1
In [27]: ratings.apply(lambda x: (ratings.T.sub(x,'index')).sum(),1)
Out[27]:
Alice Bob Carol Dave
Alice 0 -2 -3 -1
Bob 2 0 -1 1
Carol 3 1 0 2
Dave 1 -1 -2 0
来源:https://stackoverflow.com/questions/16924421/pandas-apply-function-to-current-row-against-all-other-rows