Sample from custom distribution in R

问题

I have implemented an alternate parameterization of the negative binomial distribution in R, like so (also see here):

nb = function(n, l, a){
  first = choose((n + a - 1), a-1)
  second = (l/(l+a))^n
  third = (a/(l+a))^a
  return(first*second*third)
}

Where n is the count, lambda is the mean, and a is the overdispersion term.

I would like to draw random samples from this distribution in order to validate my implementation of a negative binomial mixture model, but am not sure how to go about doing this. The CDF of this function isn't easily defined, so I considered trying rejection sampling as discussed here, but that didn't work either (and I'm not sure why- the article says to first draw from a uniform distribution between 0 and 1, but I want my NB distribution to model integer counts...I'm not sure if I understand this approach fully.)

Thank you for your help.

回答1:

It seems like you could:

1) Draw a uniform random number between zero and one.

2) Numerically integrate the probability density function (this is really just a sum, since the distribution is discrete and lower-bounded at zero).

3) Whichever value in your integration takes the cdf past your random number, that's your random draw.

So all together, do something like the following:

r <- runif(1,0,1)
cdf <- 0
i <- -1
while(cdf < r){
  i <- i+1
  p <- PMF(i)
  cdf <- cdf + p
}

Where PMF(i) is the probability mass over a count of i, as specified by the parameters of the distribution. The value of i when this while-loop finishes is your sample.

回答2:

I recommend you look up the Uniform distribution as well as the Universality of the Uniform. You can do exactly what you want by passing a uniformly distributed variable to the inverse CDF of the NB Binomial and what you will get is set of points sampled from your NB Binomial distribution.

EDIT: I see that the negative binomial has a CDF which has no closed form inverse. My second recommendation would be to scrap your function and use a built-in:

library(MASS)
rnegbin(n, mu = n, theta = stop("'theta' must be specified"))

回答3:

If you really just want to test and so speed is not the issue, the inversion method, as mentioned by others, is probably the way to go.

For a discrete random variable, it requires a simple while loop. See Non-Uniform Random Variate Generation by L. Devroye, chapter 3, p. 85.

来源：https://stackoverflow.com/questions/42941091/sample-from-custom-distribution-in-r

标签

statistics

distribution

sampling