How to generate random numbers to satisfy a specific mean and median in python?

后端 未结 5 1346
独厮守ぢ
独厮守ぢ 2021-02-14 10:59

I would like to generate n random numbers e.g., n=200, where the range of possible values is between 2 and 40 with a mean of 12 and median is 6.5.

I searche

5条回答
  •  花落未央
    2021-02-14 11:26

    Here, you want a median value lesser than the mean value. That means that a uniform distribution is not appropriate: you want many little values and fewer great ones.

    Specifically, you want as many value lesser or equal to 6 as the number of values greater or equal to 7.

    A simple way to ensure that the median will be 6.5 is to have the same number of values in the range [ 2 - 6 ] as in [ 7 - 40 ]. If you choosed uniform distributions in both ranges, you would have a theorical mean of 13.75, which is not that far from the required 12.

    A slight variation on the weights can make the theorical mean even closer: if we use [ 5, 4, 3, 2, 1, 1, ..., 1 ] for the relative weights of the random.choices of the [ 7, 8, ..., 40 ] range, we find a theorical mean of 19.98 for that range, which is close enough to the expected 20.

    Example code:

    >>> pop1 = list(range(2, 7))
    >>> pop2 = list(range(7, 41))
    >>> w2 = [ 5, 4, 3, 2 ] + ( [1] * 30)
    >>> r1 = random.choices(pop1, k=2500)
    >>> r2 = random.choices(pop2, w2, k=2500)
    >>> r = r1 + r2
    >>> random.shuffle(r)
    >>> statistics.mean(r)
    12.0358
    >>> statistics.median(r)
    6.5
    >>>
    

    So we now have a 5000 values distribution that has a median of exactly 6.5 and a mean value of 12.0358 (this one is random, and another test will give a slightly different value). If we want an exact mean of 12, we just have to tweak some values. Here sum(r) is 60179 when it should be 60000, so we have to decrease 175 values which were neither 2 (would go out of range) not 7 (would change the median).

    In the end, a possible generator function could be:

    def gendistrib(n):
        if n % 2 != 0 :
            raise ValueError("gendistrib needs an even parameter")
        n2 = n//2     # n / 2 in Python 2
        pop1 = list(range(2, 7))               # lower range
        pop2 = list(range(7, 41))              # upper range
        w2 = [ 5, 4, 3, 2 ] + ( [1] * 30)      # weights for upper range
        r1 = random.choices(pop1, k=n2)        # lower part of the distrib.
        r2 = random.choices(pop2, w2, k=n2)    # upper part
        r = r1 + r2
        random.shuffle(r)                      # randomize order
        # time to force an exact mean
        tot = sum(r)
        expected = 12 * n
        if tot > expected:                     # too high: decrease some values
            for i, val in enumerate(r):
                if val != 2 and val != 7:
                    r[i] = val - 1
                    tot -= 1
                    if tot == expected:
                        random.shuffle(r)      # shuffle again the decreased values
                        break
        elif tot < expected:                   # too low: increase some values
            for i, val in enumerate(r):
                if val != 6 and val != 40:
                    r[i] = val + 1
                    tot += 1
                    if tot == expected:
                        random.shuffle(r)      # shuffle again the increased values
                        break
        return r
    

    It is really fast: I could timeit gendistrib(10000) at less than 0.02 seconds. But it should not be used for small distributions (less than 1000)

提交回复
热议问题