count the frequency that a value occurs in a dataframe column

后端 未结 13 1885
耶瑟儿~
耶瑟儿~ 2020-11-22 03:29

I have a dataset

|category|
cat a
cat b
cat a

I\'d like to be able to return something like (showing unique values and frequency)



        
相关标签:
13条回答
  • 2020-11-22 04:14

    Using list comprehension and value_counts for multiple columns in a df

    [my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
    

    https://stackoverflow.com/a/28192263/786326

    0 讨论(0)
  • 2020-11-22 04:15

    If you want to apply to all columns you can use:

    df.apply(pd.value_counts)
    

    This will apply a column based aggregation function (in this case value_counts) to each of the columns.

    0 讨论(0)
  • 2020-11-22 04:15
    your data:
    
    |category|
    cat a
    cat b
    cat a
    

    solution:

     df['freq'] = df.groupby('category')['category'].transform('count')
     df =  df.drop_duplicates()
    
    0 讨论(0)
  • 2020-11-22 04:16

    @metatoaster has already pointed this out. Go for Counter. It's blazing fast.

    import pandas as pd
    from collections import Counter
    import timeit
    import numpy as np
    
    df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])
    

    Timers

    %timeit -n 10000 df['NumA'].value_counts()
    # 10000 loops, best of 3: 715 µs per loop
    
    %timeit -n 10000 df['NumA'].value_counts().to_dict()
    # 10000 loops, best of 3: 796 µs per loop
    
    %timeit -n 10000 Counter(df['NumA'])
    # 10000 loops, best of 3: 74 µs per loop
    
    %timeit -n 10000 df.groupby(['NumA']).count()
    # 10000 loops, best of 3: 1.29 ms per loop
    

    Cheers!

    0 讨论(0)
  • 2020-11-22 04:18

    Use groupby and count:

    In [37]:
    df = pd.DataFrame({'a':list('abssbab')})
    df.groupby('a').count()
    
    Out[37]:
    
       a
    a   
    a  2
    b  3
    s  2
    
    [3 rows x 1 columns]
    

    See the online docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html

    Also value_counts() as @DSM has commented, many ways to skin a cat here

    In [38]:
    df['a'].value_counts()
    
    Out[38]:
    
    b    3
    a    2
    s    2
    dtype: int64
    

    If you wanted to add frequency back to the original dataframe use transform to return an aligned index:

    In [41]:
    df['freq'] = df.groupby('a')['a'].transform('count')
    df
    
    Out[41]:
    
       a freq
    0  a    2
    1  b    3
    2  s    2
    3  s    2
    4  b    3
    5  a    2
    6  b    3
    
    [7 rows x 2 columns]
    
    0 讨论(0)
  • 2020-11-22 04:20
    df.category.value_counts()
    

    This short little line of code will give you the output you want.

    If your column name has spaces you can use

    df['category'].value_counts()
    
    0 讨论(0)
提交回复
热议问题