How to generate random numbers to satisfy a specific mean and median in python?

后端 未结 5 1348
独厮守ぢ
独厮守ぢ 2021-02-14 10:59

I would like to generate n random numbers e.g., n=200, where the range of possible values is between 2 and 40 with a mean of 12 and median is 6.5.

I searche

5条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-02-14 11:05

    If you have a bunch of smaller arrays with the right median and mean, you can combine them to produce a larger array.

    So... you can pre-generate smaller arrays as you are currently doing, and then combine them randomly for larger n. Of course, this will result in a biased random sample, but it sounds like you just want something that's approximately random.

    Here's working (py3) code that generates a sample of size 5000 with your desired properties, which it build from smaller samples of size 4, 6, 8, 10, ..., 18.

    Note, that I changed how the smaller random samples are built: half of the numbers must be <= 6 and half >= 7 if the median is to be 6.5, so we generate those halves independently. This speeds things up massively.

    import collections
    import numpy as np
    import random
    
    rs = collections.defaultdict(list)
    for i in range(50):
        n = random.randrange(4, 20, 2)
        while True:
            x=np.append(np.random.randint(2, 7, size=n//2), np.random.randint(7, 41, size=n//2))
            if x.mean() == 12 and np.median(x) == 6.5:
                break
        rs[len(x)].append(x)
    
    def random_range(n):
        if n % 2:
            raise AssertionError("%d must be even" % n)
        r = []
        while n:
            i = random.randrange(4, min(20, n+1), 2)
            # Don't be left with only 2 slots left.
            if n - i == 2: continue
            xs = random.choice(rs[i])
            r.extend(xs)
            n -= i
        random.shuffle(r)
        return r
    
    xs = np.array(random_range(5000))
    print([(i, list(xs).count(i)) for i in range(2, 41)])
    print(len(xs))
    print(xs.mean())
    print(np.median(xs))
    

    Output:

    [(2, 620), (3, 525), (4, 440), (5, 512), (6, 403), (7, 345), (8, 126), (9, 111), (10, 78), (11, 25), (12, 48), (13, 61), (14, 117), (15, 61), (16, 62), (17, 116), (18, 49), (19, 73), (20, 88), (21, 48), (22, 68), (23, 46), (24, 75), (25, 77), (26, 49), (27, 83), (28, 61), (29, 28), (30, 59), (31, 73), (32, 51), (33, 113), (34, 72), (35, 33), (36, 51), (37, 44), (38, 25), (39, 38), (40, 46)]
    5000
    12.0
    6.5
    

    The first line of the output shows that there's 620 2's, 52 3's, 440 4's etc. in the final array.

提交回复
热议问题