How to efficiently rearrange pandas data as follows?

ε祈祈猫儿з 提交于 2019-12-22 08:30:04

问题


I need some help with a concise and first of all efficient formulation in pandas of the following operation:

Given a data frame of the format

id    a   b    c   d
1     0   -1   1   1
42    0    1   0   0
128   1   -1   0   1

Construct a data frame of the format:

id     one_entries
1      "c d"
42     "b"
128    "a d"

That is, the column "one_entries" contains the concatenated names of the columns for which the entry in the original frame is 1.


回答1:


Here's one way using boolean rule and applying lambda func.

In [58]: df
Out[58]:
    id  a  b  c  d
0    1  0 -1  1  1
1   42  0  1  0  0
2  128  1 -1  0  1

In [59]: cols = list('abcd')

In [60]: (df[cols] > 0).apply(lambda x: ' '.join(x[x].index), axis=1)
Out[60]:
0    c d
1      b
2    a d
dtype: object

You can assign the result to df['one_entries'] =

Details of apply func.

Take first row.

In [83]: x = df[cols].ix[0] > 0

In [84]: x
Out[84]:
a    False
b    False
c     True
d     True
Name: 0, dtype: bool

x gives you Boolean values for the row, values greater than zero. x[x] will return only True. Essentially a series with column names as index.

In [85]: x[x]
Out[85]:
c    True
d    True
Name: 0, dtype: bool

x[x].index gives you the column names.

In [86]: x[x].index
Out[86]: Index([u'c', u'd'], dtype='object')



回答2:


Same reasoning as John Galt's, but a bit shorter, constructing a new DataFrame from a dict.

pd.DataFrame({
    'one_entries': (test_df > 0).apply(lambda x: ' '.join(x[x].index), axis=1)
})

#       one_entries
#   1           c d
#  42             b
# 128           a d


来源:https://stackoverflow.com/questions/40829103/how-to-efficiently-rearrange-pandas-data-as-follows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!