问题
I'm looking for a way to generate a random string of n bytes in Python in a similar way to os.urandom()
method except providing a way to seed the data generation.
So far I have:
def genRandData(size):
buf = chr(random.randint(0,255))
for i in range(size-1):
buf = buf + chr(random.randint(0,255))
return str(buf)
However this function is very slow, generating a megabyte of data takes about 1.8 seconds on my machine. Is there any way of improving this (or alternatively a way to seed os.urandom).
回答1:
NEW ANSWER
After re-reading OP's question, I understand now that it's about raw bytes, not ascii chars string
So, how about this?
import random
gl = 0
def randBytes(size):
global gl
nr = bytearray(random.getrandbits(8) for _ in xrange(size))
gl = nr
return
%timeit randBytes(1000000)
1 loops, best of 3: 262 ms per loop
In [27]: gl.__sizeof__()
Out[27]: 1087223
OLD ANSWER HERE
import random
import string
def generateRandomString(size):
return(''.join(random.choice(string.ascii_letters) for i in range(size)))
Notes:
One ascii character is 1 byte. So "size" denotes both length of string and size in bytes.
You can use string.ascii_uppercase or ascii_lowercase to have either lower and uppercase
random.seed can be used to specify the seed.
random.seed([x])¶
Initialize the basic random number generator. Optional argument x can be any hashable object. If x is omitted or None, current system time is used; current system time is also used to initialize the generator when the module is first imported. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).
So you could have:
import random
import string
def generateRandomString(size, seed=None):
if seed != None:
random.seed(seed)
return(''.join(random.choice(string.ascii_letters) for i in range(size)))
Timings:
In [30]: %time generateRandomString(1000000)
Wall time: 554 ms
<and then output>
回答2:
If you have numpy
available, it has a version of the random
module as numpy.random
that contains this function that you might consider:
numpy.random.bytes(length)
It is very fast:
$ python -mtimeit "import numpy" "numpy.random.bytes(1<<30)"
10 loops, best of 3: 2.19 sec per loop
That's for 1GiB.
And you can seed it with numpy.random.seed
.
回答3:
As Dan D. says, letting numpy
generate your bytes in one hit at C speed is going to be way faster than producing them one at a time at Python speed.
However, if you don't want to use numpy
you can make your code a little more efficient.
Building a string by concatenation eg buf = buf + chr(random.randint(0,255))
is very slow, since a new buf
has to be allocated on every loop (remember, Python strings are immutable). The usual technique in Python for building a string from substrings is to accumulate the substrings in a list then to use the str.join()
method to combine them in one go.
We can also save a little bit of time by pre-generating a list of our 1 byte strings rather than calling chr()
for every byte we want.
from random import seed, choice
allbytes = [chr(i) for i in range(256)]
def random_bytes(n):
bytes = []
for _ in range(n):
bytes.append(choice(allbytes))
return ''.join(bytes)
We can streamline this and make it slightly more efficient by using a list comprehension:
def random_bytes(n):
return ''.join([choice(allbytes) for _ in range(n)])
Depending on how you intend to use these random bytes, you may find it useful to put them into a bytearray or bytes
object.
Here's a variant based on cristianmtr's new answer:
def random_bytes(n):
return bytes(bytearray(getrandbits(8) for _ in xrange(n)))
You could use str()
in place of bytes()
, but bytes()
is better for Python 3, since Python 3 strings are Unicode.
回答4:
Python 3.9 random.randbytes
+ random.seed
Docs: https://docs.python.org/3.9/library/random.html#random.randbytes
main.py
#!/usr/bin/env python
import random
import sys
random.seed(0)
sys.stdout.buffer.write(random.randbytes(8))
writes 8 pseudorandom bytes to stdout with fixed seed of 0:
./main.py | hd
outputs:
00000000 cd 07 2c d8 be 6f 9f 62 |..,..o.b|
00000008
Its definition in CPython is simply:
def randbytes(self, n):
"""Generate n random bytes."""
return self.getrandbits(n * 8).to_bytes(n, 'little')
Here it is converted to a Bash oneliner: Something similar to /dev/urandom with configurable seed?
On my Lenovo ThinkPad P51, I can dump 100 million bytes in ramfs in 0.5s. however, if I try to dump 1 billion it blows up with:
Python int too large to convert to C int
so it is something to keep in mind.
来源:https://stackoverflow.com/questions/32329381/generating-random-string-of-seedable-data