How to generate random numbers to satisfy a specific mean and median in python?

后端 未结 5 1339
独厮守ぢ
独厮守ぢ 2021-02-14 10:59

I would like to generate n random numbers e.g., n=200, where the range of possible values is between 2 and 40 with a mean of 12 and median is 6.5.

I searche

5条回答
  •  孤独总比滥情好
    2021-02-14 11:15

    While this post already has an accepted answer, I'd like to contribute a general non integer approach. It does not need loops or testing. The idea is to take a PDF with compact support. Taking the idea of the accepted answer of Kasrâmvd, make two distributions in the left and right interval. Chose shape parameters such that the mean falls to the given value. The interesting opportunity here is that one can create a continuous PDF, i.e. without jumps where the intervals join.

    As an example I have chosen the beta distribution. To have finite non-zero values at the border I've chosen beta =1 for the left and alpha = 1 for the right. Looking at the definition of the PDF and the requirement of the mean the continuity gives two equations:

    • 4.5 / alpha = 33.5 / beta
    • 2 + 6.5 * alpha / ( alpha + 1 ) + 6.5 + 33.5 * 1 / ( 1 + beta ) = 24

    This is a quadratic equation rather easy to solve. The just using scipy.stat.beta like

    from scipy.stats import beta
    
    import matplotlib.pyplot as plt
    import numpy as np
    
    x1 = np.linspace(2, 6.5, 200 )
    x2 = np.linspace(6.5, 40, 200 )
    
    # i use s and t not alpha and beta
    s = 1./737 *(np.sqrt(294118) - 418 )
    t = 1./99 *(np.sqrt(294118) - 418 )
    
    data1 = beta.rvs(s, 1, loc=2, scale=4.5, size=20000)
    data2 = beta.rvs(1, t, loc=6.5, scale=33.5, size=20000)
    data = np.concatenate( ( data1, data2 ) )
    print np.mean( data1 ), 2 + 4.5 * s/(1.+s)
    print np.mean( data2 ), 6.5 + 33.5/(1.+t) 
    print np.mean( data )
    print np.median( data )
    
    fig = plt.figure()
    ax = fig.add_subplot( 1, 1, 1 )
    ax.hist(data1, bins=13, density=True )
    ax.hist(data2, bins=67, density=True )
    ax.plot( x1, beta.pdf( x1, s, 1, loc=2, scale=4.5 ) )
    ax.plot( x2, beta.pdf( x2, 1, t, loc=6.5, scale=33.5 ) )
    ax.set_yscale( 'log' )
    plt.show()
    

    provides

    >> 2.661366939244768 2.6495436216856976
    >> 21.297348804473618 21.3504563783143
    >> 11.979357871859191
    >> 6.5006779033245135
    

    so results are as required and it looks like:

提交回复
热议问题