问题
I am using a custom function f(x)
to define a custom distribution using copy
's rv_continuous
class. My code is
class my_pdf_gen(rv_continuous):
def _pdf(self, x, integral):
return f(x)/integral
where integral
ensure the normalisation. I am able to create an instance of it with
my_pdf = my_pdf_gen(my_int,a = a, b = b, name = 'my pdf')
with a,b
the upper and lower limit of the value's range, and my_int= scipy.integrate.quad(f, a, b)[0]
.
I am also able to create a random sample of data using my_pdf.rvs(my_int, size = 5)
, but this is very slow. (Up to 6 seconds when size=9
).
I read that one should also overwrite some other methods in the class (like _ppf
), but from the examples I found it isn't clear to me how to achieve it in my case.
Thanks a lot!
回答1:
It's expected to be slow since the generic implementation does root-solving for cdf, which itself uses numerical integration.
So your best bet is to provide a _ppf
or _rvs
implementation. How to do this greatly depends on the details of f(x)
. If you cannot solve f(x) = r
analytically, consider tabulating / inverse interpolation or rejection sampling.
回答2:
I solved the problem by changing approach and using Monte Carlo's rejection sampler method
def rejection_sampler(p,xbounds,pmax):
while True:
x = np.random.rand(1)*(xbounds[1]-xbounds[0])+xbounds[0]
y = np.random.rand(1)*pmax
if y<=p(x):
return x
where p
is the probability density function, xbounds
is a tuple containing the upper and lower limits of of the pdf and pmax
is the maximum value of the pdf on the domain.
Monte Carlo's rejection sampler was suggested here: python: random sampling from self-defined probability function
来源:https://stackoverflow.com/questions/62498793/scipy-rv-continuous-very-slow