Convert pandas DataFrame column of comma separated strings to one-hot encoded

后端 未结 2 1009
生来不讨喜
生来不讨喜 2020-12-09 11:45

I have a large dataframe (‘data’) made up of one column. Each row in the column is made of a string and each string is made up of comma separated categories. I wish to one h

相关标签:
2条回答
  • 2020-12-09 12:24

    Figured there is a simpler answer, or I felt this as more simple compared to multiple operations that we have to make.

    1. Make sure the column has unique values separated be commas

    2. Use get dummies in built parameter to specify the separator as comma. The default for this is pipe separated.

      data = {"mesh": ["A, B, C", "C,B", ""]}
      sof_df=pd.DataFrame(data)
      sof_df.mesh=sof_df.mesh.str.replace(' ','')
      sof_df.mesh.str.get_dummies(sep=',')
      

    OUTPUT:

        A   B   C
    0   1   1   1
    1   0   1   1
    2   0   0   0
    
    0 讨论(0)
  • 2020-12-09 12:38

    Note that you're not dealing with OHEs.

    str.split + stack + get_dummies + sum

    df = pd.DataFrame(data)
    df
    
          mesh
    0  A, B, C
    1      C,B
    2         
    
    (df.mesh.str.split('\s*,\s*', expand=True)
       .stack()
       .str.get_dummies()
       .sum(level=0))
    df
    
       A  B  C
    0  1  1  1
    1  0  1  1
    2  0  0  0
    

    apply + value_counts

    (df.mesh.str.split(r'\s*,\s*', expand=True)
       .apply(pd.Series.value_counts, 1)
       .iloc[:, 1:]
       .fillna(0, downcast='infer'))
    
       A  B  C
    0  1  1  1
    1  0  1  1
    2  0  0  0
    

    pd.crosstab

    x = df.mesh.str.split('\s*,\s*', expand=True).stack()
    pd.crosstab(x.index.get_level_values(0), x.values).iloc[:, 1:]
    df
    
    col_0  A  B  C
    row_0         
    0      1  1  1
    1      0  1  1
    2      0  0  0
    
    0 讨论(0)
提交回复
热议问题