So I wrote a short Python function to plot distribution outcome of dice experiments. It\'s working fine but when I run for example dice(1,5000)
or dice(10
Your plot is only showing 5 bars - the bar is to the right of the number, so I believe the results for 5
and 6
are being combined. If you change to range(1,8)
you see more of what you expect.
If you are lazy (like me), you can also use numpy to directly generate a matrix and seaborn to deal with bins for you:
import numpy as np
import seaborn as sns
dices = 1000
throws = 5000
x = np.random.randint(6, size=(dices, throws)) + 1
sns.distplot(x)
Which gives:
Seaborn usually make good choices, which can save a bit of time in configuration. That's worth a try at least. You can also use the kde=False
option on the seaborn plot to get rid of the density estimate.
Just for the sake of it and to show how seaborn behave, the same with the sum over 100 dices:
dices = 100
throws = 5000
x = np.random.randint(6, size=(dices, throws)) + 1
sns.distplot(x.sum(axis=0), kde=False)
According to a sample of your code, the issue is a plotting problem, not a computational one, which is why you are seeing the correct mean. As you can see, the following image shows five bars, the last one being twice the size of the others:
Notice also that the bars are labeled on the left, and there is therefore no "6" bar. This has to do with what plt.hist means by bins
:
If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin; in this case, bins may be unequally spaced. All but the last (righthand-most) bin is half-open.
So to specify bin edges, you probably want something more like
plt.hist(np.ravel(result), bins=np.arange(0.5, 7.5, 1))
And the result:
Unasked Questions
If you want to simulate N * n
data points, you can use numpy directly. Replace your original initialization of result
and the for
loop with any of the following lines:
result = (np.random.uniform(size=(n, N)) * 6 + 1).astype(int)
result = np.random.uniform(1.0. 7.0, size=(n, N)).astype(int)
result = np.random.randint(1, 7, size=(n, N))
The last line is preferable in terms of efficiency and accuracy.
Another possible improvement is in how you compute the histogram. Right now, you are using plt.hist
, which calls np.histogram and plt.bar. For small integers like you have, np.bincount is arguably a much better binning technique:
count = np.bincount(result.ravel())[1:]
plt.bar(np.arange(1, 7), count)
Notice that this also simplifies the plotting since you specify the centers of the bars directly, instead of having plt.hist
guess it for you.