How to understand autocorrelations caused by seeding a RNG too much?

后端 未结 2 1485
粉色の甜心
粉色の甜心 2020-12-20 20:55

In response to this question I ran the following VBA experiment:

Sub Test()
    Dim i As Long, A As Variant
    Dim count1 As Long, count2 As Long
    ReDim A         


        
相关标签:
2条回答
  • 2020-12-20 21:12

    Partial answer only, fell free to edit and complete.


    Well, there is clearly a correlation when you overuse the Randomize function.

    I tried the following code, with a conditional formatting (black fill for values >0.5), and there is clearly patterns appearing (try to comment the Randomize to see a more "random" pattern. (best seen with 20 pt columns and 10% zoom)

    Function Rndmap()
        Dim i As Long, j As Long
        Dim bmp(1 To 512, 1 To 512) As Long
        For i = 1 To 512
            For j = 1 To 512
                ' Rnd -1 ' uncomment this line to get a big white and black lines pattern.
                Randomize 'comment this line to have a random pattern
                bmp(i, j) = IIf(Rnd() < 0.5, 0, 1)
            Next j
        Next i
        Range(Cells(1, 1), Cells(512, 512)) = bmp
    End Function
    

    So while the MSDN states that "Using Randomize with the same value for number does not repeat the previous sequence.", implying that if the Timer returns twice the same value, the Rnd should keep on the same random sequence without reseting, there is still some behind the scene link..

    Some screenshots:

    Rnd() only: Rnd

    Using Randomize: randomize

    Using Rnd -1 and Randomize: Rnd -1

    0 讨论(0)
  • 2020-12-20 21:18

    The Randomize method initialises the Rnd function with the current system time as it's seed, you can also specify a number with Randomize to be used as the seed.

    I decided to test how long a sequence continues before repeating itself:

    Sub randomRepeatTest()
        For i = 1 To 100000
            Randomize
            randomThread = randomThread & Int(9 * Rnd + 1)
            If i Mod 2 = 0 Then
                If Left(randomThread, i / 2) = Right(randomThread, i / 2) Then
                    Debug.Print i / 2
                    Exit Sub
                End If
            End If
        Next i
    End Sub
    

    This sub generates a random sequence of the digits 0 - 9, and as the sequence becomes an even length it is tested to see if the first half of the sequence matches the second half, and if so it outputs the length the sequence reached before repeating. After running it a number of times, and discounting where a digit is repeated twice at the beginning, the result comes out at 256 (nice).

    Providing any value to Randomize will still return a result of 256.


    We're randomizing Rnd every loop, so what's going on here?

    Well as I said at the beginning, if no value is given to Randomize, it will use the system time as the seed. The resolution of this time is something I can't seem to find sourced, however I believe it to be low.

    I have tested using the value of timer which returns the time of day in seconds to 2 decimal places (e.g. 60287.81). I have also tried GetTickCount which returns the system active time (starts counting at boot) in milliseconds. Both of these still result in the 256 sequence limit.

    So, why when we're randomizing every loop does the sequence repeat? Well the reality is, the code is executed within a millisecond. Essentially, we're providing the same number to randomize every loop, and so we're not actually shuffling the seed.


    So, is Rnd more random without Randomize?

    I ran the above sub again without Randomize; nothing returned. I upped the loop count to 2,000,000; still nothing.

    I've managed to source the algorithm used by the workbook Rand formula, which I believe is the same as Rnd with no initialised seed:

    C IX, IY, IZ SHOULD BE SET TO INTEGER VALUES BETWEEN 1 AND 30000 BEFORE FIRST ENTRY

    IX = MOD(171 * IX, 30269)

    IY = MOD(172 * IY, 30307)

    IZ = MOD(170 * IZ, 30323)

    RANDOM = AMOD(FLOAT(IX) / 30269.0 + FLOAT(IY) / 30307.0 + FLOAT(IZ) / 30323.0, 1.0)

    It is an iterative function which uses the result of the previous call to generate a new number. Referenced as the Wichman-Hill procedure, it guarantees that more than 10^13 numbers will be generated before the sequence repeats itself.


    The problem with Rnd

    For the algorithm to work, it first needs to be initialised with values for IX, IY & IZ. The problem we have here is that we can't initialise the algorithm with random variables, as it is this algorithm we need in order to get random values, so the only option is to provide some static values to get it going.

    I have tested this and it seems to be the case. Opening a fresh instance of Excel, ? Rnd() returns 0.70554. Doing the same again returns the exact same number.

    So the problem we have is Rnd without using Randomize gives us a much longer sequence of random numbers, however that sequence will start at the same place each time we open Excel. Where functions are dependant on random generation, such as password generation, this doesn't suffice as we will get the same repeated results each time we open Excel.


    The solution

    Here's a function I have come up with and it seems to work well:

    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
    Public Declare Function GetTickCount Lib "kernel32" () As Long
    Public randomCount As Long
    Function getRandom()
        If randomCount Mod 255 = 0 Then
            Sleep 1
        End If
        Randomize GetTickCount
        getRandom = Rnd()
        randomCount = randomCount + 1
    End Function
    

    It makes use of the GetTickCount function as the Randomize seed. Each call adds 1 to a randomCount variable, and after every 255 runs the macro is forced to sleep for 1 millisecond (although this actually works out at around 15 on my system) so that the seed of GetTickCount will be changed, and so a new sequence of numbers will be returned by Rnd

    This of course will return the same sequence if by chance it is used at the same system time, however for most cases it will be a sufficient method for generating more random numbers. If not, it would need some fancy work using something like the Random.Org API.

    0 讨论(0)
提交回复
热议问题