发表新帖

发表新帖

pandas Series.value_counts returns inconsistent order for equal count strings

后端未结

关注

 3  2018

When I run the code below:

s = pandas.Series([\'c\', \'a\', \'b\', \'a\', \'b\'])
print(s.value_counts())

Sometimes I get this:

相关标签:

3条回答

不思量自难忘°

2021-01-12 15:02
You have a few options to sort consistently given a series:
```
s = pd.Series(['a', 'b', 'a', 'c', 'c'])
c = s.value_counts()
```
sort by index

Use pd.Series.sort_index:
```
res = c.sort_index()

a    2
b    1
c    2
dtype: int64
```
sort by count (arbitrary for ties)

For descending counts, do nothing, as this is the default. Otherwise, you can use pd.Series.sort_values, which defaults to ascending=True. In either case, you should make no assumptions on how ties are handled.
```
res = c.sort_values()

b    1
c    2
a    2
dtype: int64
```
More efficiently, you can use c.iloc[::-1] to reverse the order.

sort by count and then by index

You can use numpy.lexsort to sort by count and then by index. Note the reverse order, i.e. -c.values is used first for sorting.
```
res = c.iloc[np.lexsort((c.index, -c.values))]

a    2
c    2
b    1
dtype: int64
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2021-01-12 15:13

Adding a reindex after value_counts

df.value_counts().reindex(df.unique())
Out[353]: 
a    1
b    1
dtype: int64

Update

s.value_counts().sort_index().sort_values()

0 讨论(0)

傲寒

2021-01-12 15:20
You could use sort_index:
```
print(df.value_counts().sort_index())
```
Output:
```
a    1
b    1
dtype: int64
```
Please see the documentation if you want to use parameters (like ascending=True etc.)

sort_index vs reindex(df.unique()) (as suggested by @Wen) seem to be perform quite similar:
```
df.value_counts().sort_index():         1000 loops, best of 3: 636 µs per loop
df.value_counts().reindex(df.unique()): 1000 loops, best of 3: 880 µs per loop
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题