How to generate distributions given, mean, SD, skew and kurtosis in R?

前端 未结 8 1818
既然无缘
既然无缘 2020-11-28 04:03

Is it possible to generate distributions in R for which the Mean, SD, skew and kurtosis are known? So far it appears the best route would be to create random numbers and tra

相关标签:
8条回答
  • 2020-11-28 04:30

    The entropy method is a good idea, but if you have the data samples you use more information compared to the use of only the moments! So a moment fit is often less stable. If you have no more information about how the distribution looks like then entropy is a good concept, but if you have more information, e.g. about the support, then use it! If your data is skewed and positive then using a lognormal model is a good idea. If you know also the upper tail is finite, then do not use the lognormal, but maybe the 4-parameter Beta distribution. If nothing is known about support or tail characteristics, then maybe a scaled and shifted lognormal model is fine. If you need more flexibility regarding kurtosis, then e.g. a logT with scaling + shifting is often fine. It can also help if you known that the fit should be near-normal, if this is the case then use a model which includes the normal distribution (often the case anyway), otherwise you may e.g. use a generalized secant-hyperbolic distribution. If you want to do all this, then at some point the model will have some different cases, and you should make sure that there are no gaps or bad transition effects.

    0 讨论(0)
  • 2020-11-28 04:32

    I agree you need density estimation to replicate any distribution. However, if you have hundreds of variables, as is typical in a Monte Carlo simulation, you would need to have a compromise.

    One suggested approach is as follows:

    1. Use the Fleishman transform to get the coefficient for the given skew and kurtosis. Fleishman takes the skew and kurtosis and gives you the coefficients
    2. Generate N normal variables (mean = 0, std = 1)
    3. Transform the data in (2) with the Fleishman coefficients to transform the normal data to the given skew and kurtosis
    4. In this step, use data from from step (3) and transform it to the desired mean and standard deviation (std) using new_data = desired mean + (data from step 3)* desired std

    The resulting data from Step 4 will have the desired mean, std, skewness and kurtosis.

    Caveats:

    1. Fleishman will not work for all combinations of skewness and kurtois
    2. Above steps assume non-correlated variables. If you want to generate correlated data, you will need a step before the Fleishman transform
    0 讨论(0)
  • 2020-11-28 04:32

    As @David and @Carl wrote above, there are several packages dedicated to generate different distributions, see e.g. the Probability distributions Task View on CRAN.

    If you are interested in the theory (how to draw a sample of numbers fitting to a specific distribution with the given parameters) then just look for the appropriate formulas, e.g. see the gamma distribution on Wiki, and make up a simple quality system with the provided parameters to compute scale and shape.

    See a concrete example here, where I computed the alpha and beta parameters of a required beta distribution based on mean and standard deviation.

    0 讨论(0)
  • 2020-11-28 04:33

    Those parameters don't actually fully define a distribution. For that you need a density or equivalently a distribution function.

    0 讨论(0)
  • 2020-11-28 04:37

    This is an interesting question, which doesn't really have a good solution. I presume that even though you don't know the other moments, you have an idea of what the distribution should look like. For example, it's unimodal.

    There a few different ways of tackling this problem:

    1. Assume an underlying distribution and match moments. There are many standard R packages for doing this. One downside is that the multivariate generalisation may be unclear.

    2. Saddlepoint approximations. In this paper:

      Gillespie, C.S. and Renshaw, E. An improved saddlepoint approximation. Mathematical Biosciences, 2007.

      We look at recovering a pdf/pmf when given only the first few moments. We found that this approach works when the skewness isn't too large.

    3. Laguerre expansions:

      Mustapha, H. and Dimitrakopoulosa, R. Generalized Laguerre expansions of multivariate probability densities with moments. Computers & Mathematics with Applications, 2010.

      The results in this paper seem more promising, but I haven't coded them up.

    0 讨论(0)
  • 2020-11-28 04:37

    One solution for you might be the PearsonDS library. It allows you to use a combination of the first four moments with the restriction that kurtosis > skewness^2 + 1.

    To generate 10 random values from that distribution try:

    library("PearsonDS")
    moments <- c(mean = 0,variance = 1,skewness = 1.5, kurtosis = 4)
    rpearson(10, moments = moments)
    
    0 讨论(0)
提交回复
热议问题