This question is twofold. I am translating an R script into C++ that uses the L\'Ecuyer combined multiple recursive generator (CMRG) as it\'s engine (in particular, MRG32k3a), w
The first question is thus, where can I find the documentation on the MRG32k3a implementation in R that specifies these parameters?
I would use the source: https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/main/RNG.c#L143
The problem is I have no idea how this is done and I can't seem to find any information anywhere (aside from knowing that engines are classes).
The requirements for a RandomNumberEngine can be found here:
https://en.cppreference.com/w/cpp/named_req/RandomNumberEngine
Although it is sufficient to fulfill UniformRandomBitGenerator if you want to use uniform_real_distribution
:
Expression Return type Requirements
G::result_type T T is an unsigned integer type
G::min() T Returns the smallest value that G's operator()
may return. The value is strictly less than
G::max().
G::max() T Returns the largest value that G's operator() may
return. The value is strictly greater than
G::min()
g() T Returns a value in the closed interval [G::min(),
G::max()]. Has amortized constant complexity.
Main problem is that MRG32k3a is meant to return a floating point number in (0,1), while a C++ UniformRandomBitGenerator returns an integer type. Why do you want to integrate with the <random>
header?
Additional difficulties you would have to take into account:
Alternatives would include using R source code directly without integration with the <random>
header or link to libR
.
I have found that PRNG's with the same seeds across different languages do not necessarily produce the same result (since they may have parameters that the compiler is free to specify) as seen in the SO posts here and here. That is to say, using the same seed, the same engine, and the same distribution may result in different random numbers depending on the particular implementation of the PRNG.
The first answer explains merely that there is no random number sequence that corresponds universally to a given PRNG seed; it may be documented and implemented differently in different APIs (not just in the compiler and not just at a language level). The second answer is specific to rand
and srand
in the C language and is the case because rand and srand use an unspecified algorithm.
Although neither answer touches on random number distributions, they too are important if reproducible "randomness" is desired. In that sense, although C++ guarantees the behavior of the engines it provides, it makes the behavior of its distributions (including uniform_real_distribution) implementation-specific.
In general, problems involving seeding PRNGs for repeatable "randomness" could have been avoided if RNG APIs used a stable (unchanging) and documented algorithm not only for the seeded PRNG, but for any random number methods that use that PRNG (which, in the case of R, include runif
and rnorm
) — in the latter case because the reproducibility of "random" sequences depends on how those methods (not just the PRNG itself) are documented.
Depending on whether you wrote the R code in question, an option may be to write the C++ and R code to use a custom PRNG (as you seem to have done yourself in part) and to use custom implemented algorithms for each random number method the original R code uses (such as runif
and rnorm
). This option may be viable especially since statistical tests are generally insensitive to details of the specific PRNG in use.
Depending on how the R script is written, another option may be to pregenerate the random numbers needed by the code.