Count of elements in lists within pandas data frame

后端 未结 3 1071
你的背包
你的背包 2021-02-04 05:13

I need to get the frequency of each element in a list when the list is in a pandas data frame columns

In data:

din=pd.DataFrame({\'x\':[[\'a\',\'b\',\'c         


        
相关标签:
3条回答
  • 2021-02-04 05:51

    You can also have an one liner like this:

    df = pd.Series(sum([item for item in din.x], [])).value_counts()
    
    0 讨论(0)
  • 2021-02-04 05:56

    It is actually pretty easy with flattened lists and counters

    from matplotlib.cbook import flatten
    from collections import Counter
    
    din={'x':[['a','b','c'],['a','e','d', 'c']]}
    for a,i in din.items() :
        u=pd.DataFrame.from_dict(dict(Counter([*flatten(i)])), orient ='index').reset_index().rename(columns ={'index':a,0:str(a)+'_number'})
    

    output:

    However if din has several keys and values you will need a function to do the same trick

    from matplotlib.cbook import flatten
    from collections import Counter
    din={'x':[['a','b','c'],['a','e','d', 'c']], 'y': [['h','j'],['h','j','j']]}
    
    def foo(x):
        df = pd.DataFrame()
        for a,i in x.items() :
            u=pd.DataFrame.from_dict(dict(Counter([*flatten(i)])), orient ='index').reset_index().rename(columns ={'index':a,0:str(a)+'_number'})
            df=pd.concat([df,u])
        return df
    foo(din)
    
    0 讨论(0)
  • 2021-02-04 05:57

    First flatten values of lists and then count by value_counts or size or Counter:

    a = pd.Series([item for sublist in din.x for item in sublist])
    

    Or:

    a = pd.Series(np.concatenate(din.x))
    

    df = a.value_counts().sort_index().rename_axis('x').reset_index(name='f')
    

    Or:

    df = a.groupby(a).size().rename_axis('x').reset_index(name='f')
    

    from collections import Counter
    from  itertools import chain
    
    df = pd.Series(Counter(chain(*din.x))).sort_index().rename_axis('x').reset_index(name='f')
    
    print (df)
       x  f
    0  a  2
    1  b  1
    2  c  2
    3  d  1
    4  e  1
    
    0 讨论(0)
提交回复
热议问题