pandas Series.value_counts returns inconsistent order for equal count strings

后端 未结 3 2018
野性不改
野性不改 2021-01-12 14:38

When I run the code below:

s = pandas.Series([\'c\', \'a\', \'b\', \'a\', \'b\'])
print(s.value_counts())

Sometimes I get this:

<         


        
相关标签:
3条回答
  • 2021-01-12 15:02

    You have a few options to sort consistently given a series:

    s = pd.Series(['a', 'b', 'a', 'c', 'c'])
    c = s.value_counts()
    

    sort by index

    Use pd.Series.sort_index:

    res = c.sort_index()
    
    a    2
    b    1
    c    2
    dtype: int64
    

    sort by count (arbitrary for ties)

    For descending counts, do nothing, as this is the default. Otherwise, you can use pd.Series.sort_values, which defaults to ascending=True. In either case, you should make no assumptions on how ties are handled.

    res = c.sort_values()
    
    b    1
    c    2
    a    2
    dtype: int64
    

    More efficiently, you can use c.iloc[::-1] to reverse the order.

    sort by count and then by index

    You can use numpy.lexsort to sort by count and then by index. Note the reverse order, i.e. -c.values is used first for sorting.

    res = c.iloc[np.lexsort((c.index, -c.values))]
    
    a    2
    c    2
    b    1
    dtype: int64
    
    0 讨论(0)
  • 2021-01-12 15:13

    Adding a reindex after value_counts

    df.value_counts().reindex(df.unique())
    Out[353]: 
    a    1
    b    1
    dtype: int64
    

    Update

    s.value_counts().sort_index().sort_values()
    
    0 讨论(0)
  • 2021-01-12 15:20

    You could use sort_index:

    print(df.value_counts().sort_index())
    

    Output:

    a    1
    b    1
    dtype: int64
    

    Please see the documentation if you want to use parameters (like ascending=True etc.)

    sort_index vs reindex(df.unique()) (as suggested by @Wen) seem to be perform quite similar:

    df.value_counts().sort_index():         1000 loops, best of 3: 636 µs per loop
    df.value_counts().reindex(df.unique()): 1000 loops, best of 3: 880 µs per loop
    
    0 讨论(0)
提交回复
热议问题