Understanding “randomness”

前端 未结 28 1979
轻奢々
轻奢々 2020-11-22 15:28

I can\'t get my head around this, which is more random?

rand()

OR:

rand() * rand()

I´m f

相关标签:
28条回答
  • 2020-11-22 16:21

    Here's a simple answer. Consider Monopoly. You roll two six sided dice (or 2d6 for those of you who prefer gaming notation) and take their sum. The most common result is 7 because there are 6 possible ways you can roll a 7 (1,6 2,5 3,4 4,3 5,2 and 6,1). Whereas a 2 can only be rolled on 1,1. It's easy to see that rolling 2d6 is different than rolling 1d12, even if the range is the same (ignoring that you can get a 1 on a 1d12, the point remains the same). Multiplying your results instead of adding them is going to skew them in a similar fashion, with most of your results coming up in the middle of the range. If you're trying to reduce outliers, this is a good method, but it won't help making an even distribution.

    (And oddly enough it will increase low rolls as well. Assuming your randomness starts at 0, you'll see a spike at 0 because it will turn whatever the other roll is into a 0. Consider two random numbers between 0 and 1 (inclusive) and multiplying. If either result is a 0, the whole thing becomes a 0 no matter the other result. The only way to get a 1 out of it is for both rolls to be a 1. In practice this probably wouldn't matter but it makes for a weird graph.)

    0 讨论(0)
  • 2020-11-22 16:22

    The concept you're looking for is "entropy," the "degree" of disorder of a string of bits. The idea is easiest to understand in terms of the concept of "maximum entropy".

    An approximate definition of a string of bits with maximum entropy is that it cannot be expressed exactly in terms of a shorter string of bits (ie. using some algorithm to expand the smaller string back to the original string).

    The relevance of maximum entropy to randomness stems from the fact that if you pick a number "at random", you will almost certainly pick a number whose bit string is close to having maximum entropy, that is, it can't be compressed. This is our best understanding of what characterizes a "random" number.

    So, if you want to make a random number out of two random samples which is "twice" as random, you'd concatenate the two bit strings together. Practically, you'd just stuff the samples into the high and low halves of a double length word.

    On a more practical note, if you find yourself saddled with a crappy rand(), it can sometimes help to xor a couple of samples together --- although, if its truly broken even that procedure won't help.

    0 讨论(0)
  • 2020-11-22 16:22

    Use a linear feedback shift register (LFSR) that implements a primitive polynomial.

    The result will be a sequence of 2^n pseudo-random numbers, ie none repeating in the sequence where n is the number of bits in the LFSR .... resulting in a uniform distribution.

    http://en.wikipedia.org/wiki/Linear_feedback_shift_register http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf

    Use a "random" seed based on microsecs of your computer clock or maybe a subset of the md5 result on some continuously changing data in your file system.

    For example, a 32-bit LFSR will generate 2^32 unique numbers in sequence (no 2 alike) starting with a given seed. The sequence will always be in the same order, but the starting point will be different (obviously) for a different seeds. So, if a possibly repeating sequence between seedings is not a problem, this might be a good choice.

    I've used 128-bit LFSR's to generate random tests in hardware simulators using a seed which is the md5 results on continuously changing system data.

    0 讨论(0)
  • 2020-11-22 16:23

    It might help to think of this in more discrete numbers. Consider want to generate random numbers between 1 and 36, so you decide the easiest way is throwing two fair, 6-sided dice. You get this:

         1    2    3    4    5    6
      -----------------------------
    1|   1    2    3    4    5    6
    2|   2    4    6    8   10   12
    3|   3    6    9   12   15   18
    4|   4    8   12   16   20   24   
    5|   5   10   15   20   25   30
    6|   6   12   18   24   30   36
    

    So we have 36 numbers, but not all of them are fairly represented, and some don't occur at all. Numbers near the center diagonal (bottom-left corner to top-right corner) will occur with the highest frequency.

    The same principles which describe the unfair distribution between dice apply equally to floating point numbers between 0.0 and 1.0.

    0 讨论(0)
  • 2020-11-22 16:24

    When in doubt about what will happen to the combinations of your random numbers, you can use the lessons you learned in statistical theory.

    In OP's situation he wants to know what's the outcome of X*X = X^2 where X is a random variable distributed along Uniform[0,1]. We'll use the CDF technique since it's just a one-to-one mapping.

    Since X ~ Uniform[0,1] it's cdf is: fX(x) = 1 We want the transformation Y <- X^2 thus y = x^2 Find the inverse x(y): sqrt(y) = x this gives us x as a function of y. Next, find the derivative dx/dy: d/dy (sqrt(y)) = 1/(2 sqrt(y))

    The distribution of Y is given as: fY(y) = fX(x(y)) |dx/dy| = 1/(2 sqrt(y))

    We're not done yet, we have to get the domain of Y. since 0 <= x < 1, 0 <= x^2 < 1 so Y is in the range [0, 1). If you wanna check if the pdf of Y is indeed a pdf, integrate it over the domain: Integrate 1/(2 sqrt(y)) from 0 to 1 and indeed, it pops up as 1. Also, notice the shape of the said function looks like what belisarious posted.

    As for things like X1 + X2 + ... + Xn, (where Xi ~ Uniform[0,1]) we can just appeal to the Central Limit Theorem which works for any distribution whose moments exist. This is why the Z-test exists actually.

    Other techniques for determining the resulting pdf include the Jacobian transformation (which is the generalized version of the cdf technique) and MGF technique.

    EDIT: As a clarification, do note that I'm talking about the distribution of the resulting transformation and not its randomness. That's actually for a separate discussion. Also what I actually derived was for (rand())^2. For rand() * rand() it's much more complicated, which, in any case won't result in a uniform distribution of any sorts.

    0 讨论(0)
  • 2020-11-22 16:24

    We can compare two arrays of numbers regarding the randomness by using Kolmogorov complexity If the sequence of numbers can not be compressed, then it is the most random we can reach at this length... I know that this type of measurement is more a theoretical option...

    0 讨论(0)
提交回复
热议问题