问题
Let's say I have a table of frequencies of 3 different variables: M1, M2 and M3, over different instances: P1, ... P4:
tupl = [(0.7, 0.2, 0.1), (0,0,1), (0.2,0.6,0.2), (0.6,0.4,0)]
df_test = pd.DataFrame(tupl, columns = ["M1", "M2", "M3"], index =["P1", "P2", "P3", "P4"])
Now for each row, I want to be able to extract as a string, the occurrence of each variable, such that the final output would be something like:
output = pd.DataFrame([("M1+M2+M3"), ("M3"), ("M1+M2+M3"), ("M1+M2")], columns = ["label"], index = ["P1", "P2", "P3", "P4"])
I thought about using something like np.where(df_test!=0) but then how do I paste the column names as a string into the output?
回答1:
You can use np.where to fill the cells with labels and then join them as a string.
(
df_test.gt(0).apply(lambda x: np.where(x, x.name, None))
.apply(lambda x: '+'.join(x.dropna()), axis=1)
.to_frame('label')
)
label
P1 M1+M2+M3
P2 M3
P3 M1+M2+M3
P4 M1+M2
回答2:
I have done it this way and I hope it helps you:
import pandas as pd
df_test = pd.DataFrame(tupl, columns = ["M1", "M2", "M3"], index =["P1", "P2", "P3", "P4"])
new=[]
for row in df_test.itertuples():
aux=[]
if row.M1!=0: aux.append('M1')
if row.M2!=0: aux.append('M2')
if row.M3!=0: aux.append('M3')
output = pd.DataFrame(new, columns = ["label"], index = ["P1", "P2", "P3", "P4"])
来源:https://stackoverflow.com/questions/61262719/extract-attributes-from-pandas-columns-that-satisfy-a-condition