Get column name based on condition in pandas

[亡魂溺海] 提交于 2021-02-17 07:18:07

问题


I have a dataframe as below:

I want to get the name of the column if column of a particular row if it contains 1 in the that column.

e.g.

For Row 1: Blanks,
For Row 2: Manufacturing,
For Row 3: Manufacturing,
For Row 4: Manufacturing,
For Row 5: Social, Finance, Analytics, Advertising,

Right now I am able to get the complete row only:

primary_sectors = lambda primary_sector: sectors[
    sectors["category_list"] == primary_sector
]

Please help me to get the name of the column in the above dataframe.

I tried this code:

primary_sectors("3D").filter(items=["0"])

It gives me output as 1 but I need output as Manufacturing


回答1:


Firstly

Though your question is very ambiguous and I recommend reading this link in @sammywemmy's comment. If i understand your problem correctly... we'll talk about this mask first:

df.columns[      
    (df == 1)        # mask 
    .any(axis=0)     # mask
]

What's happening? Lets work our way outward starting from within df.columns[**HERE**] :

  1. (df == 1) makes a boolean mask of the df with True/False(1, 0)
  2. .any() as per the docs: "Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent". This gives us a handy Series to mask the column names with.

We will use this example to automate for your solution below


Next:

Automate to get an output of (<row index> ,[<col name>, <col name>,..]) where there is 1 in the row values. Although this will be slower on large datasets, it should do the trick:

import pandas as pd

data = {'foo':[0,0,0,0], 'bar':[0, 1, 0, 0], 'baz':[0,0,0,0], 'spam':[0,1,0,1]}
df = pd.DataFrame(data, index=['a','b','c','d'])

print(df)

   foo  bar  baz  spam
a    0    0    0     0
b    0    1    0     1
c    0    0    0     0
d    0    0    0     1
# group our df by index and creates a dict with lists of df's as values
df_dict = dict(
    list(
        df.groupby(df.index)
    )
)

Next step is a for loop that iterates the contents of each df in df_dict, checks them with the mask we created earlier, and prints the intended results:

for k, v in df_dict.items():               # k: name of index, v: is a df
    check = v.columns[(v == 1).any()]
    if len(check) > 0:
        print((k, check.to_list()))
('b', ['bar', 'spam'])
('d', ['spam'])

Side note:

You see how I generated sample data that can be easily reproduced? In the future, please try to ask questions with posted sample data that can be reproduced. This way it helps you understand your problem better and it is easier for us to answer it for you.




回答2:


Use DataFrame.dot:

df1 = df.dot(df.columns)

If there is multiple 1 per row:

df2 = df.dot(df.columns + ';').str.rstrip(';')


来源:https://stackoverflow.com/questions/60472196/get-column-name-based-on-condition-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!