Implementing specific distribution in python

问题

i want to return 1<l<10 with probability 1/(2^(l-1))

how i should do this rather then:

    x = random()
    if x < 0.5:
       return 2

and so on

thank you

回答1:

This is going to be fun... I am a bit rusty with these things, so a good matematician could fix my reasoning.

To generate a distribution from a formula you need first to do some integrals and calculate the cumulative density function for the specified interval. In particular we need to start to calculate the normalization constant.

This integral gives, for "k":

The "meaning" of the cumulative density function is "what's the probability to obtain a certain number that belong to the interval I need?". This question can be seen in another way: "the probability to take a number that is below or equal to 10 must be 1". This lead to the following equation that help to to find the parameter "C". Note that the first therm is the k, the second therm is the general integral of 2^(1-x) where I have replace x with 10.

Solving this we finally reach the CDF (again, it is possible that the way to find it is easier):

At this point we need to reverse the CDF for X. X is now our random number generator between 0 and 1. The formula is:

In python code I tried the following:

import numpy as np
import matplotlib.pyplot as plt

a=[ 1-   np.log2(1-(1-2**(-9))*np.random.rand()) for i in range(10000)]

plt.hist(a, normed=True)

Does it makes sense?

回答2:

While @Fabrizio answer is probably true, there is a lot simpler way to get job done - what you want is truncated exponential, because your PDF looks like

PDF(x) ~ 2^-x = e^{-x log(2)}.

There is already truncated exponential in the SciPy, take a look here.

Just set proper scale and location, and job is done. Code

import numpy as np
from scipy.stats import truncexpon
import matplotlib.pyplot as plt

vmin = 1.0
vmax = 10.0
scale=1.0/np.log(2.0)

r = truncexpon.rvs(b=(vmax-vmin)/scale, loc=vmin, scale=scale, size=100000)

print(np.min(r))
print(np.max(r))

plt.hist(r, bins=[1,2,3,4,5,6,7,8,9,10], density=True)

Histogram

And if you need to sample only integer values, there is good helper function in Numpy as well, code below, graph is quite similar

#%%
import numpy as np
import matplotlib.pyplot as plt

vmin = 1
vmax = 10

v = np.arange(vmin+1, vmax, dtype=np.int64)
p  = np.asarray([1.0/2**(l-1) for l in range(vmin+1, vmax)]) # probabilities
p /= np.sum(p) # normalization

r = np.random.choice(v, size=100000, replace=True, p=p)

print(np.min(r))
print(np.max(r))

plt.hist(r, bins=[1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5], density=True)

来源：https://stackoverflow.com/questions/58625669/implementing-specific-distribution-in-python

标签

python

random

probability