count the frequency that a value occurs in a dataframe column

后端未结

关注

 13  1884

I have a dataset

|category|
cat a
cat b
cat a

I\'d like to be able to return something like (showing unique values and frequency)

相关标签:

13条回答

独厮守ぢ

2020-11-22 03:57
```
df.apply(pd.value_counts).fillna(0)
```
value_counts - Returns object containing counts of unique values

apply - count frequency in every column. If you set axis=1, you get frequency in every row

fillna(0) - make output more fancy. Changed NaN to 0
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-11-22 03:58
In 0.18.1 groupby together with count does not give the frequency of unique values:
```
>>> df
   a
0  a
1  b
2  s
3  s
4  b
5  a
6  b

>>> df.groupby('a').count()
Empty DataFrame
Columns: []
Index: [a, b, s]
```
However, the unique values and their frequencies are easily determined using size:
```
>>> df.groupby('a').size()
a
a    2
b    3
s    2
```
With df.a.value_counts() sorted values (in descending order, i.e. largest value first) are returned by default.
0 讨论(0)
发布评论:

提交评论
- 加载中...

长情又很酷

2020-11-22 04:04

n_values = data.income.value_counts()

First unique value count

n_at_most_50k = n_values[0]

Second unique value count

n_greater_50k = n_values[1]

n_values

Output:

<=50K    34014
>50K     11208

Name: income, dtype: int64

Output:

n_greater_50k,n_at_most_50k:-
(11208, 34014)

0 讨论(0)

失恋的感觉

2020-11-22 04:07

Without any libraries, you could do this instead:

def to_frequency_table(data):
    frequencytable = {}
    for key in data:
        if key in frequencytable:
            frequencytable[key] += 1
        else:
            frequencytable[key] = 1
    return frequencytable

Example:

to_frequency_table([1,1,1,1,2,3,4,4])
>>> {1: 4, 2: 1, 3: 1, 4: 2}

0 讨论(0)

渐次进展

2020-11-22 04:07

I believe this should work fine for any DataFrame columns list.

def column_list(x):
    column_list_df = []
    for col_name in x.columns:
        y = col_name, len(x[col_name].unique())
        column_list_df.append(y)
return pd.DataFrame(column_list_df)

column_list_df.rename(columns={0: "Feature", 1: "Value_count"})

The function "column_list" checks the columns names and then checks the uniqueness of each column values.

0 讨论(0)

一向

2020-11-22 04:12

You can also do this with pandas by broadcasting your columns as categories first, e.g. dtype="category" e.g.

cats = ['client', 'hotel', 'currency', 'ota', 'user_country']

df[cats] = df[cats].astype('category')

and then calling describe:

df[cats].describe()

This will give you a nice table of value counts and a bit more :):

    client  hotel   currency    ota user_country
count   852845  852845  852845  852845  852845
unique  2554    17477   132 14  219
top 2198    13202   USD Hades   US
freq    102562  8847    516500  242734  340992

0 讨论(0)

1 2 3 下一页