How to generate random numbers to satisfy a specific mean and median in python?

后端 未结 5 1351
独厮守ぢ
独厮守ぢ 2021-02-14 10:59

I would like to generate n random numbers e.g., n=200, where the range of possible values is between 2 and 40 with a mean of 12 and median is 6.5.

I searche

5条回答
  •  北荒
    北荒 (楼主)
    2021-02-14 11:09

    One way to get a result really close to what you want is to generate two separate random ranges with length 100 that satisfies your median constraints and includes all the desire range of numbers. Then by concatenating the arrays the mean will be around 12 but not quite equal to 12. But since it's just mean that you're dealing with you can simply generate your expected result by tweaking one of these arrays.

    In [162]: arr1 = np.random.randint(2, 7, 100)    
    In [163]: arr2 = np.random.randint(7, 40, 100)
    
    In [164]: np.mean(np.concatenate((arr1, arr2)))
    Out[164]: 12.22
    
    In [166]: np.median(np.concatenate((arr1, arr2)))
    Out[166]: 6.5
    

    Following is a vectorized and very much optimized solution against any other solution that uses for loops or python-level code by constraining the random sequence creation:

    import numpy as np
    import math
    
    def gen_random(): 
        arr1 = np.random.randint(2, 7, 99)
        arr2 = np.random.randint(7, 40, 99)
        mid = [6, 7]
        i = ((np.sum(arr1 + arr2) + 13) - (12 * 200)) / 40
        decm, intg = math.modf(i)
        args = np.argsort(arr2)
        arr2[args[-41:-1]] -= int(intg)
        arr2[args[-1]] -= int(np.round(decm * 40))
        return np.concatenate((arr1, mid, arr2))
    

    Demo:

    arr = gen_random()
    print(np.median(arr))
    print(arr.mean())
    
    6.5
    12.0
    

    The logic behind the function:

    In order for us to have a random array with that criteria we can concatenate 3 arrays together arr1, mid and arr2. arr1 and arr2 each hold 99 items and the mid holds 2 items 6 and 7 so that make the final result to give as 6.5 as the median. Now we an create two random arrays each with length 99. All we need to do to make the result to have a 12 mean is to find the difference between the current sum and 12 * 200 and subtract the result from our N largest numbers which in this case we can choose them from arr2 and use N=50.

    Edit:

    If it's not a problem to have float numbers in your result you can actually shorten the function as following:

    import numpy as np
    import math
    
    def gen_random(): 
        arr1 = np.random.randint(2, 7, 99).astype(np.float)
        arr2 = np.random.randint(7, 40, 99).astype(np.float)
        mid = [6, 7]
        i = ((np.sum(arr1 + arr2) + 13) - (12 * 200)) / 40
        args = np.argsort(arr2)
        arr2[args[-40:]] -= i
        return np.concatenate((arr1, mid, arr2))
    

提交回复
热议问题