Is python's random number generation easily reproducible?

前端 未结 4 1657
轮回少年
轮回少年 2021-01-13 12:01

I was reading about python\'s random module in standard library. It amazes me that when I set the seed and produce a few random numbers:

random.seed(1)
for i         


        
4条回答
  •  余生分开走
    2021-01-13 12:31

    The existing answers are great, but I'll just add a few points.

    Update:

    Actually, if you don't supply a seed, the random number generator is seeded with random bits from the system random source, it only falls back to using the system time as a seed if the OS doesn't have a random source. Also note that recent versions of Python can use an improved seeding scheme. From the docs:

    random.seed(a=None, version=2)

    Initialize the random number generator.

    If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).

    If a is an int, it is used directly.

    With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used.

    With version 1 (provided for reproducing random sequences from older versions of Python), the algorithm for str and bytes generates a narrower range of seeds.

    Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.


    Generating a CAPTCHA code is not a high-security application compared to say, generating secret cryptographic keys, especially keys that are intended to be used multiple times. As a corollary, the amount of entropy required to generate a CAPTCHA code is smaller than what's required for a cryptographic key.

    Bear in mind that the system time used to seed random is (probably) not the system time in seconds - it's more likely to be the time in microseconds, or even nanoseconds, so it's not easy for an attacker to figure the seed out from a brute-search, apart from the considerations mentioned by Ned.

    Here's a quick demo, running on Python 2.6.6 on a 2GHz Linux system.

    #!/usr/bin/env python
    ''' random seeding demo'''
    
    from __future__ import print_function
    import time
    from random import seed, randint, random
    
    def rf():
        return randint(10, 99)
    
    def put_time():
        print('%.15f' % time.time())
    
    r = range(10)
    a = []
    
    put_time()
    for i in r:
        seed()
        a.append([rf() for j in r])
    put_time()
    
    for row in a:
        print(row)
    

    Typical output

    1436617059.071794986724854
    1436617059.074091911315918
    [95, 25, 50, 75, 80, 38, 21, 26, 85, 82]
    [75, 96, 14, 13, 76, 53, 94, 68, 80, 66]
    [79, 33, 65, 86, 12, 32, 80, 83, 36, 42]
    [28, 47, 62, 21, 52, 30, 54, 62, 22, 28]
    [22, 40, 71, 36, 78, 64, 17, 33, 99, 43]
    [81, 15, 32, 15, 63, 57, 83, 67, 12, 62]
    [22, 56, 54, 55, 51, 56, 34, 56, 94, 16]
    [64, 82, 37, 80, 70, 91, 56, 41, 55, 12]
    [47, 37, 64, 14, 69, 65, 42, 17, 22, 17]
    [43, 43, 73, 82, 61, 55, 32, 52, 86, 74]
    

    As you can see, less than 3 milliseconds elapse between the start of the outer loop & its end, but all of the lists in a are quite different.

    Note that the seed passed to random.seed() can be any hashable object, and when you pass it a non-integer (eg a float like the system time), it first gets hashed to create an integer.

    Still, there's no need to merely use the system time as the seed: you can use SystemRandom / os.urandom() to get the seed. That way, the seed is more unpredictable, but you get the speed of Mersenne Twister; SystemRandom is a little slower than Mersenne Twister because it has to make system calls. However, even urandom isn't totally safe.

    From the GNU urandom man page:

    The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. The generator also keeps an estimate of the number of bits of noise in the entropy pool. From this entropy pool random numbers are created.

    When read, the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool. /dev/random should be suitable for uses that need very high quality randomness such as one-time pad or key generation. When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.

    A read from the /dev/urandom device will not block waiting for more entropy. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current unclassified literature, but it is theoretically possible that such an attack may exist. If this is a concern in your application, use /dev/random instead.

    Usage

    If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys.

提交回复
热议问题