Understanding “randomness”

前端 未结 28 1981
轻奢々
轻奢々 2020-11-22 15:28

I can\'t get my head around this, which is more random?

rand()

OR:

rand() * rand()

I´m f

相关标签:
28条回答
  • 2020-11-22 16:04

    Neither is 'more random'.

    rand() generates a predictable set of numbers based on a psuedo-random seed (usually based on the current time, which is always changing). Multiplying two consecutive numbers in the sequence generates a different, but equally predictable, sequence of numbers.

    Addressing whether this will reduce collisions, the answer is no. It will actually increase collisions due to the effect of multiplying two numbers where 0 < n < 1. The result will be a smaller fraction, causing a bias in the result towards the lower end of the spectrum.

    Some further explanations. In the following, 'unpredictable' and 'random' refer to the ability of someone to guess what the next number will be based on previous numbers, ie. an oracle.

    Given seed x which generates the following list of values:

    0.3, 0.6, 0.2, 0.4, 0.8, 0.1, 0.7, 0.3, ...
    

    rand() will generate the above list, and rand() * rand() will generate:

    0.18, 0.08, 0.08, 0.21, ...
    

    Both methods will always produce the same list of numbers for the same seed, and hence are equally predictable by an oracle. But if you look at the the results for multiplying the two calls, you'll see they are all under 0.3 despite a decent distribution in the original sequence. The numbers are biased because of the effect of multiplying two fractions. The resulting number is always smaller, therefore much more likely to be a collision despite still being just as unpredictable.

    0 讨论(0)
  • 2020-11-22 16:05
    1. There is no such thing as more random. It is either random or not. Random means "hard to predict". It does not mean non-deterministic. Both random() and random() * random() are equally random if random() is random. Distribution is irrelevant as far as randomness goes. If a non-uniform distribution occurs, it just means that some values are more likely than others; they are still unpredictable.

    2. Since pseudo-randomness is involved, the numbers are very much deterministic. However, pseudo-randomness is often sufficient in probability models and simulations. It is pretty well known that making a pseudo-random number generator complicated only makes it difficult to analyze. It is unlikely to improve randomness; it often causes it to fail statistical tests.

    3. The desired properties of the random numbers are important: repeatability and reproducibility, statistical randomness, (usually) uniformly distributed, and a large period are a few.

    4. Concerning transformations on random numbers: As someone said, the sum of two or more uniformly distributed results in a normal distribution. This is the additive central limit theorem. It applies regardless of the source distribution as long as all distributions are independent and identical. The multiplicative central limit theorem says the product of two or more independent and indentically distributed random variables is lognormal. The graph someone else created looks exponential, but it is really lognormal. So random() * random() is lognormally distributed (although it may not be independent since numbers are pulled from the same stream). This may be desirable in some applications. However, it is usually better to generate one random number and transform it to a lognormally-distributed number. Random() * random() may be difficult to analyze.

    For more information, consult my book at www.performorama.org. The book is under construction, but the relevant material is there. Note that chapter and section numbers may change over time. Chapter 8 (probability theory) -- sections 8.3.1 and 8.3.3, chapter 10 (random numbers).

    0 讨论(0)
  • 2020-11-22 16:07

    It's not exactly obvious, but rand() is typically more random than rand()*rand(). What's important is that this isn't actually very important for most uses.

    But firstly, they produce different distributions. This is not a problem if that is what you want, but it does matter. If you need a particular distribution, then ignore the whole “which is more random” question. So why is rand() more random?

    The core of why rand() is more random (under the assumption that it is producing floating-point random numbers with the range [0..1], which is very common) is that when you multiply two FP numbers together with lots of information in the mantissa, you get some loss of information off the end; there's just not enough bit in an IEEE double-precision float to hold all the information that was in two IEEE double-precision floats uniformly randomly selected from [0..1], and those extra bits of information are lost. Of course, it doesn't matter that much since you (probably) weren't going to use that information, but the loss is real. It also doesn't really matter which distribution you produce (i.e., which operation you use to do the combination). Each of those random numbers has (at best) 52 bits of random information – that's how much an IEEE double can hold – and if you combine two or more into one, you're still limited to having at most 52 bits of random information.

    Most uses of random numbers don't use even close to as much randomness as is actually available in the random source. Get a good PRNG and don't worry too much about it. (The level of “goodness” depends on what you're doing with it; you have to be careful when doing Monte Carlo simulation or cryptography, but otherwise you can probably use the standard PRNG as that's usually much quicker.)

    0 讨论(0)
  • 2020-11-22 16:07

    It's easy to show that the sum of the two random numbers is not necessarily random. Imagine you have a 6 sided die and roll. Each number has a 1/6 chance of appearing. Now say you had 2 dice and summed the result. The distribution of those sums is not 1/12. Why? Because certain numbers appear more than others. There are multiple partitions of them. For example the number 2 is the sum of 1+1 only but 7 can be formed by 3+4 or 4+3 or 5+2 etc... so it has a larger chance of coming up.

    Therefore, applying a transform, in this case addition on a random function does not make it more random, or necessarily preserve randomness. In the case of the dice above, the distribution is skewed to 7 and therefore less random.

    0 讨论(0)
  • 2020-11-22 16:09

    Consider you have a simple coin flip problem where even is considered heads and odd is considered tails. The logical implementation is:

    rand() mod 2
    

    Over a large enough distribution, the number of even numbers should equal the number of odd numbers.

    Now consider a slight tweak:

    rand() * rand() mod 2
    

    If one of the results is even, then the entire result should be even. Consider the 4 possible outcomes (even * even = even, even * odd = even, odd * even = even, odd * odd = odd). Now, over a large enough distribution, the answer should be even 75% of the time.

    I'd bet heads if I were you.

    This comment is really more of an explanation of why you shouldn't implement a custom random function based on your method than a discussion on the mathematical properties of randomness.

    0 讨论(0)
  • 2020-11-22 16:10

    As others already pointed out, this question is hard to answer since everyone of us has his own picture of randomness in his head.

    That is why, I would highly recommend you to take some time and read through this site to get a better idea of randomness:

    • http://www.random.org/

    To get back to the real question. There is no more or less random in this term:

    both only appears random!

    In both cases - just rand() or rand() * rand() - the situation is the same: After a few billion of numbers the sequence will repeat(!). It appears random to the observer, because he does not know the whole sequence, but the computer has no true random source - so he can not produce randomness either.

    e.g.: Is the weather random? We do not have enough sensors or knowledge to determine if weather is random or not.

    0 讨论(0)
提交回复
热议问题