Divide a list into multiple lists based on a bin size

前端 未结 6 536
悲哀的现实
悲哀的现实 2021-01-18 03:02

I have a list containing more than 100,000 values in it.

I need to divide the list into multiple smaller lists based on a specific bin width say 0.1. Can anyone hel

相关标签:
6条回答
  • 2021-01-18 03:13

    Here is a simple and nice way using numpys digitize:

    >>> import numpy as np
    >>> mylist = np.array([-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245,
                           -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252])
    >>> bins = np.arange(0,-1,-0.1)
    >>> for i in xrange(1,10):
    ...     mylist[np.digitize(mylist,bins)==i]
    ... 
    array([-0.04325, -0.0252 ])
    array([-0.124])
    array([-0.234, -0.245])
    array([-0.315])
    array([-0.43134])
    array([-0.5325, -0.5214, -0.531 ])
    array([-0.6322, -0.6341])
    array([], dtype=float64)
    array([], dtype=float64)
    

    digitize, returns an array with the index value for the bin that each element falls into.

    0 讨论(0)
  • 2021-01-18 03:13

    This will create a dict where each value is a list of elements that fit in a bin.

    import collections
    bins = collections.defaultdict(list)
    binId = lambda x: int(x*10)
    for val in vals:
        bins[binId(val)].append(val)
    
    0 讨论(0)
  • 2021-01-18 03:27

    We can make bins with more_itertools, a third-party library.

    Given

    iterable = (
        "-0.234 -0.04325 -0.43134 -0.315 -0.6322 -0.245 "
        "-0.5325 -0.6341 -0.5214 -0.531 -0.124 -0.0252"
    ).split()
    
    iterable
    # ['-0.234', '-0.04325', '-0.43134', '-0.315', '-0.6322', '-0.245', '-0.5325', '-0.6341', '-0.5214', '-0.531', '-0.124', '-0.0252']
    

    Code

    import more_itertools as mit
    
    
    keyfunc = lambda x: float("{:.1f}".format(float(x)))
    bins = mit.bucket(iterable, key=keyfunc)
    
    keys = [-0.0,-0.1,-0.2, -0.3,-0.4,-0.5,-0.6]
    a,b,c,d,e,f,g = [list(bins[k]) for k in keys]
    c
    # ['-0.234', '-0.245']
    

    Details

    We can bin by the key function, which we define to format numbers to a single precision, i.e. -0.213 to -0.2.

    keyfunc = lambda x: float("{:.1f}".format(float(x)))
    bins = mit.bucket(iterable, key=keyfunc)
    

    These bins are accessed by the keys defined by the key function:

    c = list(bins[-0.2])
    c
    # ['-0.234', '-0.245']
    

    Access all bins by iterating keys:

    f = lambda x: float("{:.1f}".format(float(x)))
    bins = mit.bucket(iterable, key=keyfunc)
    
    keys = [-0.0,-0.1,-0.2, -0.3,-0.4,-0.5,-0.6]
    for k in keys:
        print("{} --> {}".format(k, list(bins[k])))
    

    Output

    -0.0 --> ['-0.04325', '-0.0252']
    -0.1 --> ['-0.124']
    -0.2 --> ['-0.234', '-0.245']
    -0.3 --> ['-0.315']
    -0.4 --> ['-0.43134']
    -0.5 --> ['-0.5325', '-0.5214', '-0.531']
    -0.6 --> ['-0.6322', '-0.6341']
    

    List comprehension and unpacking is another option (see Code example).

    See also more_itertools docs for more details.

    0 讨论(0)
  • 2021-01-18 03:32

    Is this what you want? (Sample output would have been helpful :)

    f = [-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245, 
         -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252]
    
    import numpy as np
    data = np.array(f)
    hist, edges = np.histogram(data, bins=10)
    print hist
    

    yields:

     [2 3 0 1 0 1 2 0 1 2]
    

    This SO question assigning points to bins might be helpful.

    0 讨论(0)
  • 2021-01-18 03:33

    Binning can be done with itertools.groupby:

    import itertools as it
    
    
    iterable = ['-0.234', '-0.04325', '-0.43134', '-0.315', '-0.6322', '-0.245',
                '-0.5325', '-0.6341', '-0.5214', '-0.531', '-0.124', '-0.0252']
    
    a,b,c,d,e,f,g = [list(g) for k, g in it.groupby(sorted(iterable), key=lambda x: x[:4])]
    c
    # ['-0.234', '-0.245']
    

    Note: this simple key function assumes the values in the iterable are between -0.0 and -10.0. Consider lambda x: "{:.1f}".format(float(x)) for general cases.

    See also this post for details on how groupby works.

    0 讨论(0)
  • 2021-01-18 03:34

    This works:

    l=[-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245,
    -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252]
    
    d={}
    for k,v in zip([int(i*10) for i in l],l):
       d.setdefault(k,[]).append(v)
    
    LoL=[d[e] for e in sorted(d.keys(), reverse=True)]
    
    for i,l in enumerate(LoL,1):
        print('list',i,l)    
    

    Prints:

    list 1 [-0.04325, -0.0252]
    list 2 [-0.124]
    list 3 [-0.234, -0.245]
    list 4 [-0.315]
    list 5 [-0.43134]
    list 6 [-0.5325, -0.5214, -0.531]
    list 7 [-0.6322, -0.6341]
    

    How it works:

    1: The list
    >>> l=[-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245,
    ... -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252]
    
    2: Produce the keys:
    >>> [int(i*10) for i in l]
    [-2, 0, -4, -3, -6, -2, -5, -6, -5, -5, -1, 0]
    
    3: Produce tuples to put in the dict:
    >>> zip([int(i*10) for i in l],l)
    [(-2, -0.234), (0, -0.04325), (-4, -0.43134), (-3, -0.315), (-6, -0.6322), 
     (-2, -0.245), (-5, -0.5325), (-6, -0.6341), (-5, -0.5214), (-5, -0.531), 
     (-1, -0.124), (0, -0.0252)]
    
    4: unpack the tuples into k,v and loop over the list
    >>>for k,v in zip([int(i*10) for i in l],l):
    
    5: add k key to a dict (if not there) and append the float value to a list associated 
       with that key:
        d.setdefault(k,[]).append(v)
    

    I suggest a Python tutorial on these statements.

    0 讨论(0)
提交回复
热议问题