Count occurrences of items in Series in each row of a DataFrame

后端 未结 3 1749
一个人的身影
一个人的身影 2020-12-06 06:02

I have a pandas.DataFrame that looks like this.

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2           


        
相关标签:
3条回答
  • 2020-12-06 06:14

    You could apply value_counts:

    In [11]: df.apply(pd.Series.value_counts, axis=1)
    Out[11]: 
       C1  C2  C3  None
    0   1 NaN NaN     2
    1   1   1 NaN     1
    2   2 NaN NaN     1
    3   1   1   1   NaN
    

    So you can fill the NaN and applend just the base values you want:

    In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
    Out[12]: 
       C1  C2  C3
    0   1   0   0
    1   1   1   0
    2   2   0   0
    3   1   1   1
    

    Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).

    0 讨论(0)
  • 2020-12-06 06:21

    Andy's answer is spot on.

    I'm adding this answer, if C1,C2...Cn list is huge and we want to view only subset of them.

    dff = df.copy()
    dff['C1']=(df == 'C1').T.sum()
    dff['C2']=(df == 'C2').T.sum()
    dff['C3']=(df == 'C3').T.sum()
    dff
      COL1  COL2  COL3  C1  C2  C3
    0   C1  None  None   1   0   0
    1   C1    C2  None   1   1   0
    2   C1    C1  None   2   0   0
    3   C1    C2    C3   1   1   1
    
    0 讨论(0)
  • 2020-12-06 06:33

    Usually apply + serise function to whole dataframe will slowing down the whole process , Additional Reading : Link

    df.mask(df.eq('None')).stack().str.get_dummies().sum(level=0)
    Out[165]: 
       C1  C2  C3
    0   1   0   0
    1   1   1   0
    2   2   0   0
    3   1   1   1
    

    Or you can do with Counter

    from  collections import Counter
    
    pd.DataFrame([ Counter(x) for x in df.values]).drop('None',1)
    Out[170]: 
       C1   C2   C3
    0   1  NaN  NaN
    1   1  1.0  NaN
    2   2  NaN  NaN
    3   1  1.0  1.0
    
    0 讨论(0)
提交回复
热议问题