How to generate distributions given, mean, SD, skew and kurtosis in R?

前端 未结 8 1819
既然无缘
既然无缘 2020-11-28 04:03

Is it possible to generate distributions in R for which the Mean, SD, skew and kurtosis are known? So far it appears the best route would be to create random numbers and tra

相关标签:
8条回答
  • 2020-11-28 04:42

    There is a Johnson distribution in the SuppDists package. Johnson will give you a distribution that matches either moments or quantiles. Others comments are correct that 4 moments does not a distribution make. But Johnson will certainly try.

    Here's an example of fitting a Johnson to some sample data:

    require(SuppDists)
    
    ## make a weird dist with Kurtosis and Skew
    a <- rnorm( 5000, 0, 2 )
    b <- rnorm( 1000, -2, 4 )
    c <- rnorm( 3000,  4, 4 )
    babyGotKurtosis <- c( a, b, c )
    hist( babyGotKurtosis , freq=FALSE)
    
    ## Fit a Johnson distribution to the data
    ## TODO: Insert Johnson joke here
    parms<-JohnsonFit(babyGotKurtosis, moment="find")
    
    ## Print out the parameters 
    sJohnson(parms)
    
    ## add the Johnson function to the histogram
    plot(function(x)dJohnson(x,parms), -20, 20, add=TRUE, col="red")
    

    The final plot looks like this:

    enter image description here

    You can see a bit of the issue that others point out about how 4 moments do not fully capture a distribution.

    Good luck!

    EDIT As Hadley pointed out in the comments, the Johnson fit looks off. I did a quick test and fit the Johnson distribution using moment="quant" which fits the Johnson distribution using 5 quantiles instead of the 4 moments. The results look much better:

    parms<-JohnsonFit(babyGotKurtosis, moment="quant")
    plot(function(x)dJohnson(x,parms), -20, 20, add=TRUE, col="red")
    

    Which produces the following:

    enter image description here

    Anyone have any ideas why Johnson seems biased when fit using moments?

    0 讨论(0)
  • 2020-11-28 04:47

    This question was asked more than 3 years ago, so I hope my answer doesn't come too late.

    There is a way to uniquely identify a distribution when knowing some of the moments. That way is the method of Maximum Entropy. The distribution that results from this method is the distribution that maximizes your ignorance about the structure of the distribution, given what you know. Any other distribution that also has the moments that you specified but is not the MaxEnt distribution is implicitly assuming more structure than what you input. The functional to maximize is Shannon's Information Entropy, $S[p(x)] = - \int p(x)log p(x) dx$. Knowing the mean, sd, skewness and kurtosis, translate as constraints on the first, second, third, and fourth moments of the distribution, respectively.

    The problem is then to maximize S subject to the constraints: 1) $\int x p(x) dx = "first moment"$, 2) $\int x^2 p(x) dx = "second moment"$, 3) ... and so on

    I recommend the book "Harte, J., Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics (Oxford University Press, New York, 2011)."

    Here is a link that tries to implement this in R: https://stats.stackexchange.com/questions/21173/max-entropy-solver-in-r

    0 讨论(0)
提交回复
热议问题