I\'m working on a project with a friend where we need to generate a random hash. Before we had time to discuss, we both came up with different approaches and because they are us
Testing randomness is notoriously difficult - however, I would chose the second method, but ONLY (or, only as far as comes to mind) for this case, where the hash is seeded by a random number.
The whole point of hashes is to create a number that is vastly different based on slight differences in input. For your use case, the randomness of the input should do. If, however, you wanted to hash a file and detect one eensy byte's difference, that's when a hash algorithm shines.
I'm just curious, though: why use a hash algorithm at all? It seems that you're looking for a purely random number, and there are lots of libraries that generate uuid's, which have far stronger guarantees of uniqueness than random number generators.
The second solution clearly has more entropy than the first. Assuming the quality of the source of the random bits would be the same for os.urandom
and random.random
:
More importantly, the quality of the randomness coming from os.urandom
is expected and documented to be much better than the randomness coming from random.random
. os.urandom
's docstring says "suitable for cryptographic use".
This solution:
os.urandom(16).encode('hex')
is the best since it uses the OS to generate randomness which should be usable for cryptographic purposes (depends on the OS implementation).
random.random()
generates pseudo-random values.
Hashing a random value does not add any new randomness.
random.random()
is a pseudo-radmom generator, that means the numbers are generated from a sequence. if you call random.seed(some_number)
, then after that the generated sequence will always be the same.
os.urandom()
get's the random numbers from the os' rng, which uses an entropy pool to collect real random numbers, usually by random events from hardware devices, there exist even random special entropy generators for systems where a lot of random numbers are generated.
on unix system there are traditionally two random number generators: /dev/random
and /dev/urandom
. calls to the first block if there is not enough entropy available, whereas when you read /dev/urandom
and there is not enough entropy data available, it uses a pseudo-rng and doesn't block.
so the use depends usually on what you need: if you need a few, equally distributed random numbers, then the built in prng should be sufficient. for cryptographic use it's always better to use real random numbers.
if you want a unique identifier (uuid), then you should use
import uuid
uuid.uuid4().hex
https://docs.python.org/3/library/uuid.html