Fast arbitrary distribution random sampling (inverse transform sampling)

前端 未结 5 600
名媛妹妹
名媛妹妹 2020-11-28 07:00

The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss

相关标签:
5条回答
  • 2020-11-28 07:10

    I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).

    def metropolis_hastings(target_density, size=500000):
        burnin_size = 10000
        size += burnin_size
        x0 = np.array([[0, 0]])
        xt = x0
        samples = []
        for i in range(size):
            xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
            accept_prob = (target_density(xt_candidate))/(target_density(xt))
            if np.random.uniform(0, 1) < accept_prob:
                xt = xt_candidate
            samples.append(xt)
        samples = np.array(samples[burnin_size:])
        samples = np.reshape(samples, [samples.shape[0], 2])
        return samples
    

    This function requires a function target_density which takes in a data-point and computes its probability.

    For details check-out this detailed answer of mine.

    0 讨论(0)
  • 2020-11-28 07:18

    You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function to random numbers having standard uniform distribution in the interval [0,1].

    After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:

    [inverted_function(random.random()) for x in range(1000)]
    

    More on Inverse Transform Sampling:

    • http://en.wikipedia.org/wiki/Inverse_transform_sampling

    Also, there is a good question on StackOverflow related to the topic:

    • Pythonic way to select list elements with different probability
    0 讨论(0)
  • 2020-11-28 07:18
    import numpy as np
    import scipy.interpolate as interpolate
    
    def inverse_transform_sampling(data, n_bins, n_samples):
        hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
        cum_values = np.zeros(bin_edges.shape)
        cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
        inv_cdf = interpolate.interp1d(cum_values, bin_edges)
        r = np.random.rand(n_samples)
        return inv_cdf(r)
    

    So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.

    0 讨论(0)
  • 2020-11-28 07:25

    This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.

    class Distribution(object):
        """
        draws samples from a one dimensional probability distribution,
        by means of inversion of a discrete inverstion of a cumulative density function
    
        the pdf can be sorted first to prevent numerical error in the cumulative sum
        this is set as default; for big density functions with high contrast,
        it is absolutely necessary, and for small density functions,
        the overhead is minimal
    
        a call to this distibution object returns indices into density array
        """
        def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
            self.shape          = pdf.shape
            self.pdf            = pdf.ravel()
            self.sort           = sort
            self.interpolation  = interpolation
            self.transform      = transform
    
            #a pdf can not be negative
            assert(np.all(pdf>=0))
    
            #sort the pdf by magnitude
            if self.sort:
                self.sortindex = np.argsort(self.pdf, axis=None)
                self.pdf = self.pdf[self.sortindex]
            #construct the cumulative distribution function
            self.cdf = np.cumsum(self.pdf)
        @property
        def ndim(self):
            return len(self.shape)
        @property
        def sum(self):
            """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
            return self.cdf[-1]
        def __call__(self, N):
            """draw """
            #pick numbers which are uniformly random over the cumulative distribution function
            choice = np.random.uniform(high = self.sum, size = N)
            #find the indices corresponding to this point on the CDF
            index = np.searchsorted(self.cdf, choice)
            #if necessary, map the indices back to their original ordering
            if self.sort:
                index = self.sortindex[index]
            #map back to multi-dimensional indexing
            index = np.unravel_index(index, self.shape)
            index = np.vstack(index)
            #is this a discrete or piecewise continuous distribution?
            if self.interpolation:
                index = index + np.random.uniform(size=index.shape)
            return self.transform(index)
    
    
    if __name__=='__main__':
        shape = 3,3
        pdf = np.ones(shape)
        pdf[1]=0
        dist = Distribution(pdf, transform=lambda i:i-1.5)
        print dist(10)
        import matplotlib.pyplot as pp
        pp.scatter(*dist(1000))
        pp.show()
    

    And as a more real-world relevant example:

    x = np.linspace(-100, 100, 512)
    p = np.exp(-x**2)
    pdf = p[:,None]*p[None,:]     #2d gaussian
    dist = Distribution(pdf, transform=lambda i:i-256)
    print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()
    
    0 讨论(0)
  • 2020-11-28 07:33

    Here is a rather nice way of performing inverse transform sampling with a decorator.

    import numpy as np
    from scipy.interpolate import interp1d
    
    def inverse_sample_decorator(dist):
        
        def wrapper(pnts, x_min=-100, x_max=100, n=1e5, **kwargs):
            
            x = np.linspace(x_min, x_max, int(n))
            cumulative = np.cumsum(dist(x, **kwargs))
            cumulative -= cumulative.min()
            f = interp1d(cumulative/cumulative.max(), x)
            return f(np.random.random(pnts))
        
        return wrapper
    

    Using this decorator on a Gaussian distribution, for example:

    @inverse_sample_decorator
    def gauss(x, amp=1.0, mean=0.0, std=0.2):
        return amp*np.exp(-(x-mean)**2/std**2/2.0)
    

    You can then generate sample points from the distribution by calling the function. The keyword arguments x_min and x_max are the limits of the original distribution and can be passed as arguments to gauss along with the other key word arguments that parameterise the distribution.

    samples = gauss(5000, mean=20, std=0.8, x_min=19, x_max=21)
    

    Alternatively, this can be done as a function that takes the distribution as an argument (as in your original question),

    def inverse_sample_function(dist, pnts, x_min=-100, x_max=100, n=1e5, 
                                **kwargs):
            
        x = np.linspace(x_min, x_max, int(n))
        cumulative = np.cumsum(dist(x, **kwargs))
        cumulative -= cumulative.min()
        f = interp1d(cumulative/cumulative.max(), x)
            
        return f(np.random.random(pnts))
    
    0 讨论(0)
提交回复
热议问题