R - random distribution with predefined min, max, mean, and sd values

扶醉桌前 提交于 2019-12-05 18:05:34

Discussion:

Hi. It is very interesting problem. It needs quite an effort to be solved properly and not always solution can be found.

First thing is that when you truncate a distribution (set a min and max for it) standard deviation is limited (has a maximum depending on min and max values). If you want too big value of it - you can not get it.

Second restriction limits mean. It is obvious that if you want mean below minimum and above maximum it will not work, but you may want something too close to limits and still it can not be satisfied.

Third restriction limits a combination of this parameters. Im not sure how does it work, but i am pretty sure not all the combinations may be satisfied.

But there are some combinations that may work and may be found.

Solution:

The problem is: what are the parameters: mean and sd of truncated (cut) distribution with defined limits a and b, so in the end the mean will be equal to desired_mean and standard deviation will be equal to desired_sd.

It is important that values of parameters: mean and sd are used before truncation. So that is why in the end mean and deviation are diffrent.

Below is the code that solves the problem using function optim(). It may not be the best solution for this problem, but it generally works:

require(truncnorm)

eval_function <- function(mean_sd){
    mean <- mean_sd[1]
    sd <- mean_sd[2]
    sample <- rtruncnorm(n = n, a = a, b = b, mean = mean, sd = sd)
    mean_diff <-abs((desired_mean - mean(sample))/desired_mean)
    sd_diff <- abs((desired_sd - sd(sample))/desired_sd)
    mean_diff + sd_diff
}

n = 1000
a <- 1
b <- 6
desired_mean <- 3
desired_sd <- 1

set.seed(1)
o <- optim(c(desired_mean, desired_sd), eval_function)

new_n <- 10000
your_sample <- rtruncnorm(n = new_n, a = a, b = b, mean = o$par[1], sd = o$par[2])
mean(your_sample)
sd(your_sample)
min(your_sample)
max(your_sample)
eval_function(c(o$par[1], o$par[2]))

I am very interested if there are other solutions to that problem, so please post them if you find other answers.

EDIT:

@Mikko Marttila: Thanks to your comment and link: Wikipedia I implemented formulas to calculate mean and sd of truncated distribution. Now the solution is WAY more elegant and it should calculate quite accurately mean and sd of the desired distribution if they exist. It works much faster also.

I implemented eval_function2 which should be used in the optim() function instead of previous one:

eval_function2 <- function(mean_sd){
    mean <- mean_sd[1]
    sd <- mean_sd[2]

    alpha <- (a - mean)/sd
    betta <- (b - mean)/sd

    trunc_mean <- mean + sd * (dnorm(alpha, 0, 1) - dnorm(betta, 0, 1)) / 
                  (pnorm(betta, 0, 1) - pnorm(alpha, 0, 1))

    trunc_var <- (sd ^ 2) * 
                 (1 + 
                  (alpha * dnorm(alpha, 0, 1) - betta * dnorm(betta, 0, 1))/
                  (pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)) -
                 (dnorm(alpha, 0, 1) - dnorm(betta, 0, 1))/
                 (pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)))

    trunc_sd <- trunc_var ^ 0.5

    mean_diff <-abs((desired_mean - trunc_mean)/desired_mean)
    sd_diff <- abs((desired_sd - trunc_sd)/desired_sd)
}
Mikko Marttila

You could use a generalized form of the beta distribution, known as the Pearson type I distribution. The standard beta distribution is defined on the interval (0,1), but you can take a linear transformation of a standard beta distributed variable to obtain values between any (min, max). The answer to this question on CrossValidated explains how to parameterize a beta distribution with its mean and variance, with certain constraints.

While it's possible to formulate both a truncated normal and a generalized beta distribution with the desired min, max, mean and sd, the shape of the two distributions will be very different. This is because the truncated normal distribution has a positive probability density at the endpoints of its support interval, while in the generalized beta distribution the density will always fall smoothly to zero at the endpoints. Which shape is more preferable will depend on your intended application.

Here's an implementation in R for generating generalized beta distributed observations with a mean, variance, min and max parameterization.

rgbeta <- function(n, mean, var, min = 0, max = 1)
{
  dmin <- mean - min
  dmax <- max - mean

  if (dmin <= 0 || dmax <= 0)
  {
    stop(paste("mean must be between min =", min, "and max =", max)) 
  }

  if (var >= dmin * dmax)
  {
    stop(paste("var must be less than (mean - min) * (max - mean) =", dmin * dmax))
  }

  # mean and variance of the standard beta distributed variable
  mx <- (mean - min) / (max - min)
  vx <- var / (max - min)^2

  # find the corresponding alpha-beta parameterization
  a <- ((1 - mx) / vx - 1 / mx) * mx^2
  b <- a * (1 / mx - 1)

  # generate standard beta observations and transform
  x <- rbeta(n, a, b)
  y <- (max - min) * x + min

  return(y)
}

set.seed(1)

n <- 10000
y <- rgbeta(n, mean = 1, var = 4, min = -4, max = 5)

sapply(list(mean, sd, min, max), function(f) f(y))
#    [1]  0.9921269  2.0154131 -3.8653859  4.9838290
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!