问题
I have code that generates a certain value from -10 to 10 given a range from [0,1) The code takes the value from -10 to 10 and it will append it to a list, according to its probability. For example, -10 would be put in the list 0 times since it corresponds to the value 0, and 10 would be put 100 times (as a normalization) since it corresponds to 1 in the range.
Here is the code:
#!/usr/bin/env python
import math
import numpy as np
import matplotlib.pyplot as plt
pos = []
ceilingValue = 0.82
pValues = np.linspace(0.00, ceilingValue, num=100*ceilingValue)
for i in xrange(int(100*ceilingValue)):
p = pValues[i]
y = -11.63*math.log(-2.36279*(p - 1))
for j in xrange(i):
pos.append(y)
avg = np.average(pos)
std = np.std(pos)
hist, bins = np.histogram(pos,bins = 100)
width = 0.7*(bins[1]-bins[0])
center = (bins[:-1]+bins[1:])/2
plt.bar(center, hist, align = 'center', width = width)
plt.show()
The problem is that the histogram will generate an accurate plot, but certain values will break the trend. For example, -5.88 which corresponds to about 30 entries in the frequency count will be at about 70. I think python sees the two values and simply lumps them together but I'm not sure how to fix it. But if it's just the histogram that's doing something wrong, then it doesn't matter, I don't really need it. I just need the list, if it is right in the first place.
回答1:
I think the underlying issue is that your bin size is uniform, whereas the differences between the unique values in pos
scale exponentially. Because of that you'll always end up either with weird 'spikes' where two nearby unique values fall within the same bin, or lots of empty bins (especially if you just increase the bin count to get rid of the 'spikes').
You could try setting your bins according to the actual unique values in pos
, so that their widths are non-uniform:
# the " + [10,]" forces the rightmost bin edge to == 10
uvals = np.unique(pos+[10,])
hist, bins = np.histogram(pos,bins=uvals)
plt.bar(bins[:-1],hist,width=np.diff(bins))
回答2:
I believe you're fine. I reran your code using bins = 200
instead of bins = 100
and the spikes disappeared. I think you had values that got caught on the boundaries between bins.
来源:https://stackoverflow.com/questions/17753501/numpy-histogram-representing-floats-with-approximate-values-as-the-same