Transform cell values as column headers and fill it with 1 if matching in python

前端 未结 2 852
余生分开走
余生分开走 2021-01-26 22:32

I have a dataframe:

df
ID 0   1   2    3  4 .... 
1  10  20  5    1  2 ....
2  3   4   NaN    10 1 ....

And I need to transpose the cell values

相关标签:
2条回答
  • 2021-01-26 22:41

    Another way using melt and pd.crosstab

    df1 = df.melt('ID')
    df_final =  pd.crosstab(index=df1.ID, columns=df1.value).reset_index()
    
    Out[673]:
    value  ID  1.0  2.0  3.0  4.0  5.0  10.0  20.0
    0      1   1    1    0    0    1    1     1
    1      2   1    0    1    1    0    1     0
    

    Note: default counting of pd.crosstab uses frequency. Therefore, duplicate values will count as their frequencies. If you want only 1/0 indicator, just chain ge(1) and astype as follows

    pd.crosstab(index=df1.ID, columns=df1.value).ge(1).astype(int).reset_index()
    
    0 讨论(0)
  • 2021-01-26 22:42

    Use DataFrame.set_index with DataFrame.stack for remove missing values, then create indicators by get_dummies and return 1/0 by max by first level, last convert columns to integers:

    df1 = (pd.get_dummies(df.set_index('ID').stack())
             .max(level=0)
             .rename(columns=int)
             .reset_index())
    print (df1)
       ID  1  2  3  4  5  10  20
    0   1  1  1  0  0  1   1   1
    1   2  1  0  1  1  0   1   0
    

    EDIT:

    print (df)
       ID   0   1    2   3  4  5
    0   1  10  20  5.0   1  2  5
    1   2   3   4  NaN  10  1  2
    

    If use max then always in output are 0/1 values (check 5 column):

    df1 = (pd.get_dummies(df.set_index('ID').stack())
             .max(level=0)
             .rename(columns=int)
             .reset_index())
    print (df1)
       ID  1  2  3  4  5  10  20
    0   1  1  1  0  0  1   1   1
    1   2  1  1  1  1  0   1   0
    

    But if use sum it count values (check 5 column):

    df2 = (pd.get_dummies(df.set_index('ID').stack())
             .sum(level=0)
             .rename(columns=int)
             .reset_index())
    print (df2)
       ID  1  2  3  4  5  10  20
    0   1  1  1  0  0  2   1   1
    1   2  1  1  1  1  0   1   0
    
    0 讨论(0)
提交回复
热议问题