Are there functions to retrieve the histogram counts of a Series in pandas?

后端 未结 3 1506
故里飘歌
故里飘歌 2021-01-01 14:16

There is a method to plot Series histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it?

I ke

相关标签:
3条回答
  • 2021-01-01 14:43

    If you know the number of bins you want, you can use pandas' cut function, which is now accessible via value_counts. Using the same random example:

    s = pd.Series(np.random.randn(100))
    s.value_counts(bins=5)
    
    Out[55]: 
    (-0.512, 0.311]     40
    (0.311, 1.133]      25
    (-1.335, -0.512]    14
    (1.133, 1.956]      13
    (-2.161, -1.335]     8
    
    0 讨论(0)
  • 2021-01-01 14:56

    If your Series was discrete you could use value_counts:

    In [11]: s = pd.Series([1, 1, 2, 1, 2, 2, 3])
    
    In [12]: s.value_counts()
    Out[12]:
    2    3
    1    3
    3    1
    dtype: int64
    

    You can see that s.hist() is essentially equivalent to s.value_counts().plot().

    If it was of floats an awful hacky solution could be to use groupby:

    s.groupby(lambda i: np.floor(2*s[i]) / 2).count()
    
    0 讨论(0)
  • 2021-01-01 14:57

    Since hist and value_counts don't use the Series' index, you may as well treat the Series like an ordinary array and use np.histogram directly. Then build a Series from the result.

    In [4]: s = Series(randn(100))
    
    In [5]: counts, bins = np.histogram(s)
    
    In [6]: Series(counts, index=bins[:-1])
    Out[6]: 
    -2.968575     1
    -2.355032     4
    -1.741488     5
    -1.127944    26
    -0.514401    23
     0.099143    23
     0.712686    12
     1.326230     5
     1.939773     0
     2.553317     1
    dtype: int32
    

    This is a really convenient way to organize the result of a histogram for subsequent computation.

    To index by the center of each bin instead of the left edge, you could use bins[:-1] + np.diff(bins)/2.

    0 讨论(0)
提交回复
热议问题