count the frequency that a value occurs in a dataframe column

后端未结

关注

 13  1885

I have a dataset

|category|
cat a
cat b
cat a

I\'d like to be able to return something like (showing unique values and frequency)

相关标签:

13条回答

予麋鹿

2020-11-22 04:14
Using list comprehension and value_counts for multiple columns in a df
```
[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
```
https://stackoverflow.com/a/28192263/786326
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-11-22 04:15
If you want to apply to all columns you can use:
```
df.apply(pd.value_counts)
```
This will apply a column based aggregation function (in this case value_counts) to each of the columns.
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦谈多话

2020-11-22 04:15

your data:

|category|
cat a
cat b
cat a

solution:

 df['freq'] = df.groupby('category')['category'].transform('count')
 df =  df.drop_duplicates()

0 讨论(0)

陌清茗

2020-11-22 04:16

@metatoaster has already pointed this out. Go for Counter. It's blazing fast.

import pandas as pd
from collections import Counter
import timeit
import numpy as np

df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])

Timers

%timeit -n 10000 df['NumA'].value_counts()
# 10000 loops, best of 3: 715 µs per loop

%timeit -n 10000 df['NumA'].value_counts().to_dict()
# 10000 loops, best of 3: 796 µs per loop

%timeit -n 10000 Counter(df['NumA'])
# 10000 loops, best of 3: 74 µs per loop

%timeit -n 10000 df.groupby(['NumA']).count()
# 10000 loops, best of 3: 1.29 ms per loop

Cheers!

0 讨论(0)

星月不相逢

2020-11-22 04:18

Use groupby and count:

In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()

Out[37]:

   a
a   
a  2
b  3
s  2

[3 rows x 1 columns]

See the online docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html

Also value_counts() as @DSM has commented, many ways to skin a cat here

In [38]:
df['a'].value_counts()

Out[38]:

b    3
a    2
s    2
dtype: int64

If you wanted to add frequency back to the original dataframe use transform to return an aligned index:

In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df

Out[41]:

   a freq
0  a    2
1  b    3
2  s    2
3  s    2
4  b    3
5  a    2
6  b    3

[7 rows x 2 columns]

0 讨论(0)

时光说笑

2020-11-22 04:20
```
df.category.value_counts()
```
This short little line of code will give you the output you want.

If your column name has spaces you can use
```
df['category'].value_counts()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...