Generate 3 random number that sum to 1 in R

前端 未结 6 1152
花落未央
花落未央 2020-12-16 18:15

I am hoping to create 3 (non-negative) quasi-random numbers that sum to one, and repeat over and over.

Basically I am trying to partition something into three rand

相关标签:
6条回答
  • 2020-12-16 18:50

    I guess it depends on what distribution you want on the numbers, but here is one way:

    diff(c(0, sort(runif(2)), 1))
    

    Use replicate to get as many sets as you want:

    > x <- replicate(5, diff(c(0, sort(runif(2)), 1)))
    > x
               [,1]       [,2]      [,3]      [,4]       [,5]
    [1,] 0.66855903 0.01338052 0.3722026 0.4299087 0.67537181
    [2,] 0.32130979 0.69666871 0.2670380 0.3359640 0.25860581
    [3,] 0.01013117 0.28995078 0.3607594 0.2341273 0.06602238
    > colSums(x)
    [1] 1 1 1 1 1
    
    0 讨论(0)
  • 2020-12-16 18:53

    When you want to randomly generate numbers that add to 1 (or some other value) then you should look at the Dirichlet Distribution.

    There is an rdirichlet function in the gtools package and running RSiteSearch('Dirichlet') brings up quite a few hits that could easily lead you to tools for doing this (and it is not hard to code by hand either for simple Dirichlet distributions).

    0 讨论(0)
  • 2020-12-16 19:03

    I would simply randomly select 3 numbers from uniform distribution and then divide by their sum. Code as below.

    n <- 3
    x <- runif(3, 0, 1)
    y <- x/sum(x)
    sum(y)== 1
    

    n could be any number you like.

    0 讨论(0)
  • 2020-12-16 19:07

    just random 2 digits from (0, 1) and if assume its a and b then you got:

    rand1 = min(a, b)
    rand2 = abs(a - b)
    rand3 = 1 - max(a, b)
    
    0 讨论(0)
  • 2020-12-16 19:09

    This question involves subtler issues than might be at first apparent. After looking at the following, you may want to think carefully about the process that you are using these numbers to represent:

    ## My initial idea (and commenter Anders Gustafsson's):
    ## Sample 3 random numbers from [0,1], sum them, and normalize
    jobFun <- function(n) {
        m <- matrix(runif(3*n,0,1), ncol=3)
        m<- sweep(m, 1, rowSums(m), FUN="/")
        m
    }
    
    ## Andrie's solution. Sample 1 number from [0,1], then break upper 
    ## interval in two. (aka "Broken stick" distribution).
    andFun <- function(n){
      x1 <- runif(n)
      x2 <- runif(n)*(1-x1)
      matrix(c(x1, x2, 1-(x1+x2)), ncol=3)
    }
    
    ## ddzialak's solution (vectorized by me)
    ddzFun <- function(n) {
        a <- runif(n, 0, 1)
        b <- runif(n, 0, 1)
        rand1 = pmin(a, b)
        rand2 = abs(a - b)
        rand3 = 1 - pmax(a, b)
        cbind(rand1, rand2, rand3)
    }
    
    ## Simulate 10k triplets using each of the functions above
    JOB <- jobFun(10000)
    AND <- andFun(10000)
    DDZ <- ddzFun(10000)
    
    ## Plot the distributions of values
    par(mfcol=c(2,2))
    hist(JOB, main="JOB")
    hist(AND, main="AND")
    hist(DDZ, main="DDZ")
    

    enter image description here

    0 讨论(0)
  • 2020-12-16 19:11

    This problem and the different solutions proposed intrigued me. I did a little test of the three basic algorithms suggested and what average values they would yield for the numbers generated.

    choose_one_and_divide_rest
    means:                [ 0.49999212  0.24982403  0.25018384]
    standard deviations:  [ 0.28849948  0.22032758  0.22049302]
    time needed to fill array of size 1000000 was 26.874945879 seconds
    
    choose_two_points_and_use_intervals
    means:                [ 0.33301421  0.33392816  0.33305763]
    standard deviations:  [ 0.23565652  0.23579615  0.23554689]
    time needed to fill array of size 1000000 was 28.8600130081 seconds
    
    choose_three_and_normalize
    means:                [ 0.33334531  0.33336692  0.33328777]
    standard deviations:  [ 0.17964206  0.17974085  0.17968462]
    time needed to fill array of size 1000000 was 27.4301018715 seconds
    

    The time measurements are to be taken with a grain of salt as they might be more influenced by the Python memory management than by the algorithm itself. I'm too lazy to do it properly with timeit. I did this on 1GHz Atom so that explains why it took so long.

    Anyway, choose_one_and_divide_rest is the algorithm suggested by Andrie and the poster of the question him/herself (AND): you choose one value a in [0,1], then one in [a,1] and then you look what you have left. It adds up to one but that's about it, the first division is twice as large as the other two. One might have guessed as much ...

    choose_two_points_and_use_intervals is the accepted answer by ddzialak (DDZ). It takes two points in the interval [0,1] and uses the size of the three sub-intervals created by these points as the three numbers. Works like a charm and the means are all 1/3.

    choose_three_and_normalize is the solution by Anders Gustafsson and Josh O'Brien (JOB). It just generates three numbers in [0,1] and normalizes them back to a sum of 1. Works just as well and surprisingly a little bit faster in my Python implementation. The variance is a bit lower than for the second solution.

    There you have it. No idea to what beta distribution these solutions correspond or which set of parameters in the corresponding paper I referred to in a comment but maybe someone else can figure that out.

    0 讨论(0)
提交回复
热议问题