Generating random string of seedable data

末鹿安然 提交于 2021-02-04 18:14:48

问题


I'm looking for a way to generate a random string of n bytes in Python in a similar way to os.urandom() method except providing a way to seed the data generation.

So far I have:

def genRandData(size):
    buf = chr(random.randint(0,255))
    for i in range(size-1):
        buf = buf + chr(random.randint(0,255))
    return str(buf)

However this function is very slow, generating a megabyte of data takes about 1.8 seconds on my machine. Is there any way of improving this (or alternatively a way to seed os.urandom).


回答1:


NEW ANSWER

After re-reading OP's question, I understand now that it's about raw bytes, not ascii chars string

So, how about this?

import random
gl = 0
def randBytes(size):
    global gl
    nr = bytearray(random.getrandbits(8) for _ in xrange(size))
    gl = nr
    return

%timeit randBytes(1000000)
1 loops, best of 3: 262 ms per loop

In [27]: gl.__sizeof__()
Out[27]: 1087223

OLD ANSWER HERE

import random
import string
def generateRandomString(size):
    return(''.join(random.choice(string.ascii_letters) for i in range(size)))

Notes:

One ascii character is 1 byte. So "size" denotes both length of string and size in bytes.

You can use string.ascii_uppercase or ascii_lowercase to have either lower and uppercase

random.seed can be used to specify the seed.

random.seed([x])¶

Initialize the basic random number generator. Optional argument x can be any hashable object. If x is omitted or None, current system time is used; current system time is also used to initialize the generator when the module is first imported. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).

So you could have:

    import random
    import string
    def generateRandomString(size, seed=None):
        if seed != None:
             random.seed(seed)
        return(''.join(random.choice(string.ascii_letters) for i in range(size)))

Timings:

In [30]: %time generateRandomString(1000000)
Wall time: 554 ms
<and then output>



回答2:


If you have numpy available, it has a version of the random module as numpy.random that contains this function that you might consider:

numpy.random.bytes(length)

It is very fast:

$ python -mtimeit "import numpy" "numpy.random.bytes(1<<30)"
10 loops, best of 3: 2.19 sec per loop

That's for 1GiB.

And you can seed it with numpy.random.seed.




回答3:


As Dan D. says, letting numpy generate your bytes in one hit at C speed is going to be way faster than producing them one at a time at Python speed.

However, if you don't want to use numpy you can make your code a little more efficient.

Building a string by concatenation eg buf = buf + chr(random.randint(0,255)) is very slow, since a new buf has to be allocated on every loop (remember, Python strings are immutable). The usual technique in Python for building a string from substrings is to accumulate the substrings in a list then to use the str.join() method to combine them in one go.

We can also save a little bit of time by pre-generating a list of our 1 byte strings rather than calling chr() for every byte we want.

from random import seed, choice

allbytes = [chr(i) for i in range(256)]

def random_bytes(n):
    bytes = []
    for _ in range(n):
        bytes.append(choice(allbytes))
    return ''.join(bytes)

We can streamline this and make it slightly more efficient by using a list comprehension:

def random_bytes(n):
    return ''.join([choice(allbytes) for _ in range(n)])

Depending on how you intend to use these random bytes, you may find it useful to put them into a bytearray or bytes object.

Here's a variant based on cristianmtr's new answer:

def random_bytes(n):
    return bytes(bytearray(getrandbits(8) for _ in xrange(n)))

You could use str() in place of bytes(), but bytes() is better for Python 3, since Python 3 strings are Unicode.




回答4:


Python 3.9 random.randbytes + random.seed

Docs: https://docs.python.org/3.9/library/random.html#random.randbytes

main.py

#!/usr/bin/env python
import random
import sys
random.seed(0)
sys.stdout.buffer.write(random.randbytes(8))

writes 8 pseudorandom bytes to stdout with fixed seed of 0:

./main.py | hd

outputs:

00000000  cd 07 2c d8 be 6f 9f 62                           |..,..o.b|
00000008

Its definition in CPython is simply:

    def randbytes(self, n):
        """Generate n random bytes."""
        return self.getrandbits(n * 8).to_bytes(n, 'little')

Here it is converted to a Bash oneliner: Something similar to /dev/urandom with configurable seed?

On my Lenovo ThinkPad P51, I can dump 100 million bytes in ramfs in 0.5s. however, if I try to dump 1 billion it blows up with:

Python int too large to convert to C int

so it is something to keep in mind.



来源:https://stackoverflow.com/questions/32329381/generating-random-string-of-seedable-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!