How do I count the values from a pandas column which is a list of strings?

前端 未结 5 2013
名媛妹妹
名媛妹妹 2021-01-19 21:53

I have a dataframe column which is a list of strings:

df[\'colors\']

0              [\'blue\',\'green\',\'brown\']
1              []
2              [\'green\         


        
5条回答
  •  失恋的感觉
    2021-01-19 22:18

    Use a Counter + chain, which is meant to do exactly this. Then construct the Series from the Counter object.

    import pandas as pd
    from collections import Counter
    from itertools import chain
    
    s = pd.Series([['blue','green','brown'], [], ['green','red','blue']])
    
    pd.Series(Counter(chain.from_iterable(s)))
    #blue     2
    #green    2
    #brown    1
    #red      1
    #dtype: int64
    

    While explode + value_counts are the pandas way to do things, they're slower for shorter lists.

    import perfplot
    import pandas as pd
    import numpy as np
    
    from collections import Counter
    from itertools import chain
    
    def counter(s):
        return pd.Series(Counter(chain.from_iterable(s)))
    
    def explode(s):
        return s.explode().value_counts()
    
    perfplot.show(
        setup=lambda n: pd.Series([['blue','green','brown'], [], ['green','red','blue']]*n), 
        kernels=[
            lambda s: counter(s),
            lambda s: explode(s),
        ],
        labels=['counter', 'explode'],
        n_range=[2 ** k for k in range(17)],
        equality_check=np.allclose,  
        xlabel='~len(s)'
    )
    

提交回复
热议问题