Howto bin series of float values into histogram in Python?

前端 未结 3 663
梦谈多话
梦谈多话 2020-12-30 08:40

I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150)

The data I have lo

相关标签:
3条回答
  • 2020-12-30 09:04

    When possible, don't reinvent the wheel. NumPy has everything you need:

    #!/usr/bin/env python
    import numpy as np
    
    a = np.fromfile(open('file', 'r'), sep='\n')
    # [ 0.     0.005  0.124  0.     0.004  0.     0.111  0.112]
    
    # You can set arbitrary bin edges:
    bins = [0, 0.150]
    hist, bin_edges = np.histogram(a, bins=bins)
    # hist: [8]
    # bin_edges: [ 0.    0.15]
    
    # Or, if bin is an integer, you can set the number of bins:
    bins = 4
    hist, bin_edges = np.histogram(a, bins=bins)
    # hist: [5 0 0 3]
    # bin_edges: [ 0.     0.031  0.062  0.093  0.124]
    
    0 讨论(0)
  • 2020-12-30 09:06
    from pylab import *
    data = []
    inf = open('pulse_data.txt')
    for line in inf:
        data.append(float(line))
    inf.close()
    #binning
    B = 50
    minv = min(data)
    maxv = max(data)
    bincounts = []
    for i in range(B+1):
        bincounts.append(0)
    for d in data:
        b = int((d - minv) / (maxv - minv) * B)
        bincounts[b] += 1
    # plot histogram
    
    plot(bincounts,'o')
    show()
    
    0 讨论(0)
  • 2020-12-30 09:09

    The first error is:

    Traceback (most recent call last):
      File "C:\foo\foo.py", line 17, in <module>
        diffCounts[ str(getBin(diff)) ] += 1
    TypeError: list indices must be integers
    

    Why are you converting an int to a str when a str is needed? Fix that, then we get:

    Traceback (most recent call last):
      File "C:\foo\foo.py", line 17, in <module>
        diffCounts[ getBin(diff) ] += 1
    IndexError: list index out of range
    

    because you've only made 5 buckets. I don't understand your bucketing scheme, but let's make it 50 buckets and see what happens:

    6
    Traceback (most recent call last):
      File "C:\foo\foo.py", line 21, in <module>
        maxBin = max(maxdiff)
    TypeError: 'int' object is not iterable
    

    maxdiff is a single value out of your list of ints, so what is max doing here? Remove it, now we get:

    6
    Traceback (most recent call last):
      File "C:\foo\foo.py", line 28, in <module>
        print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
    TypeError: argument 2 to map() must support iteration
    

    Sure enough, you're using a single value as the second argument to map. Let's simplify the last two lines from this:

     binStr = '[' + str(lo) + ',' + str(hi) + ')'
     print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
    

    to this:

     print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])
    

    Now it prints:

    6
    [0.000000, 1.000000)    3
    [1.000000, 3.000000)    0
    [3.000000, 7.000000)    2
    [7.000000, 15.000000)   0
    [15.000000, 31.000000)  0
    [31.000000, 63.000000)  0
    [63.000000, 127.000000) 3
    

    I'm not sure what else to do here, since I don't really understand the bucketing you are hoping to use. It seems to involve binary powers, but isn't making sense to me...

    0 讨论(0)
提交回复
热议问题