Python: Checking to which bin a value belongs

前端 未结 3 1114
盖世英雄少女心
盖世英雄少女心 2020-12-03 14:17

I have a list of values and a list of bin edges. Now I need to check for all values to what bin they belong to. Is there a more pythonic way than iterating over the values a

相关标签:
3条回答
  • 2020-12-03 14:57

    First of all, your code is going to fail on cases when the value is equal to a bin boundary --

    change

    if bins(j) < i < bins(j+1):
    

    to have an <= sign somewhere.

    After that, use the bisect module

    import bisect
    bisect.bisect(x, bins)
    

    or bisect.bisect_right

    depending on whether you'd prefer to take the higher or lower bin when a value is on the bin boundary.

    0 讨论(0)
  • 2020-12-03 15:01

    Probably too late, but for future reference, numpy has a function that does just that:

    http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html

    >>> my_list = [3,2,56,4,32,4,7,88,4,3,4]
    >>> bins = [0,20,40,60,80,100]
    >>> np.digitize(my_list,bins)
    array([1, 1, 3, 1, 2, 1, 1, 5, 1, 1, 1])
    

    The result is an array of indexes corresponding to the bin from bins that each element from my_list belongs too. Note that the function will also bin values that fall outside of your first and last bin edges:

    >>> my_list = [-5,200]
    >>> np.digitize(my_list,bins)
    array([0, 6])
    

    And Pandas has something like it too:

    http://pandas.pydata.org/pandas-docs/dev/basics.html#discretization-and-quantiling

    >>> pd.cut(my_list, bins)
    Categorical: 
    array(['(0, 20]', '(0, 20]', '(40, 60]', '(0, 20]', '(20, 40]', '(0, 20]',
           '(0, 20]', '(80, 100]', '(0, 20]', '(0, 20]', '(0, 20]'], dtype=object)
    Levels (5): Index(['(0, 20]', '(20, 40]', '(40, 60]', '(60, 80]',
                       '(80, 100]'], dtype=object)
    
    0 讨论(0)
  • 2020-12-03 15:22

    Maybe this will help get you on the right track:

    >>> import itertools
    >>> my_list = [3,2,56,4,32,4,7,88,4,3,4]
    >>> for k, g in itertools.groupby(sorted(my_list), lambda x: x // 20 * 20):
    ...     print k, list(g)
    ... 
    0 [2, 3, 3, 4, 4, 4, 4, 7]
    20 [32]
    40 [56]
    80 [88]
    
    0 讨论(0)
提交回复
热议问题