Python - Count occurrences of certain ranges in a list

后端 未结 4 717
一个人的身影
一个人的身影 2021-01-01 05:00

So basically I want to count the number of occurrences a floating point appears in a given list. For example: a list of grades (all scores out of 100) are inputted by the u

4条回答
  •  执笔经年
    2021-01-01 05:42

    This method uses bisect which can be more efficient, but it requires that you sort the scores first.

    from bisect import bisect
    import random
    
    scores = [random.randint(0,100) for _ in xrange(100)]
    bins = [20, 40, 60, 80, 100]
    
    scores.sort()
    counts = []
    last = 0
    for range_max in bins:
        i = bisect(scores, range_max, last)
        counts.append(i - last)
        last = i
    

    I wouldn't expect you to install numpy just for this, but if you already have numpy you can use numpy.histogram.

    UPDATE

    First, using bisect is more flexible. Using [i//n for i in scores] requires that all the bins are the same size. Using bisect allows the bins to have arbitrary limits. Also i//n means the ranges are [lo, hi). Using bisect the ranges are (lo, hi] but you can use bisect_left if you want [lo, hi).

    Second bisect is faster, see timings bellow. I've replaced scores.sort() with the slower sorted(scores) because the sorting is the slowest step and I didn't want to bias the times with a pre-sorted array, but the OP says his/her array is already sorted so bisect could make even more sense in that case.

    setup="""
    from bisect import bisect_left
    import random
    from collections import Counter
    
    def histogram(iterable, low, high, bins):
        step = (high - low) / bins
        dist = Counter(((x - low + 0.) // step for x in iterable))
        return [dist[b] for b in xrange(bins)]
    
    def histogram_bisect(scores, groups):
        scores = sorted(scores)
        counts = []
        last = 0
        for range_max in groups:
            i = bisect_left(scores, range_max, last)
            counts.append(i - last)
            last = i
        return counts
    
    def histogram_simple(scores, bin_size):
        scores = [i//bin_size for i in scores]
        return [scores.count(i) for i in range(max(scores)+1)]
    
    scores = [random.randint(0,100) for _ in xrange(100)]
    bins = range(10, 101, 10)
    """
    from timeit import repeat
    t = repeat('C = histogram(scores, 0, 100, 10)', setup=setup, number=10000)
    print min(t)
    #.95
    t = repeat('C = histogram_bisect(scores, bins)', setup=setup, number=10000)
    print min(t)
    #.22
    t = repeat('histogram_simple(scores, 10)', setup=setup, number=10000)
    print min(t)
    #.36
    

提交回复
热议问题