For example if java produces the pseudorandom sequence: 9 3 2 5 6 by using 23 as a seed, how can I do the inverse? i.e. getting 23<
Yes, it's absolutely easy to reverse engineer the number stream of a poorly designed pseudo random number generator, such as the Linear Congruential PRNG implementation in the Java programming language (java.util.Random
).
In fact, with as few as TWO values from that particular generator, and the information on the order in which the values emerged, the entire stream can be predicted.
Random random = new Random();
long v1 = random.nextInt();
long v2 = random.nextInt();
for (int i = 0; i < 65536; i++) {
long seed = v1 * 65536 + i;
if (((seed * multiplier + addend) & mask) >>> 16) == v2) {
System.out.println("Seed found: " + seed);
break;
}
}
This is precisely why it's critical to use cryptographically secure random number generators that have been vetted by the community at large for implementations that require security.
There is much more information on reverse engineering PRNGs, including java.util.Random
here. ...
If you're OK with using a String
as your seed, you can use this:
String seed = "9 3 2 5 6";
Then your generator would look like:
String[] numbers = seed.split(" ");
If you truly want to reverse engineer the "random" number generator in java, that's going to be quite difficult (I think).
It would be better to do it the other way around if you can: Start with a seed, produce the sequence, then work out from there.
The point of random number generators is that this is impossible. SecureRandom is designed to be especially cryptographically strong, but generally speaking, if you're writing a random number generator and this is possible or easy, you're doing it wrong.
That said, it's likely that it's not impossible with Java's built in Random class. (SecureRandom is another story, though.) But it will require staggering amounts of math.
To be more specific: if a polynomial-time algorithm existed to do what you want, for some particular pseudorandom number generator, then it would by definition fail the "next-bit test" described in the linked Wikipedia article, since you could predict the next elements that would be generated.
It is certainly possible to recover the seed used by java.util.Random. This post describes the math behind Random's linear congruential formula, and here is a function to discover the current seed from the last two integers returned from nextInt().
public static long getCurrentSeed(int i1, int i2) {
final long multiplier = 0x5DEECE66DL;
final long inv_mult = 0xDFE05BCB1365L;
final long increment = 0xBL;
final long mask = ((1L << 48) - 1);
long suffix = 0L;
long lastSeed;
long currSeed;
int lastInt;
for (long i=0; i < (1<<16); i++) {
suffix = i;
currSeed = ((long)i2 << 16) | suffix;
lastSeed = ((currSeed - increment) * inv_mult) & mask;
lastInt = (int)(lastSeed >>> 16);
if (lastInt == i1) {
/* We've found the current seed, need to roll back 2 seeds */
currSeed = lastSeed;
lastSeed = ((currSeed - increment) * inv_mult) & mask;
return lastSeed ^ multiplier;
}
}
/* Error, current seed not found */
System.err.println("current seed not found");
return 0;
}
This function returns a value that can be used with rand.setSeed() to generate a pseudorandom sequence of numbers starting with i1 and i2.
You want to take arbitrary sequences of numbers, then determine a short (fixed length?) key which will allow you to regenerate that sequence of numbers, without storing the original? Unfortunately, what you want is technically impossible. Here's why:
This is a particular case of compression. You have a long sequence of data, which you want to be able to recreate losslessly from a smaller piece of information. If what you are requesting were possible, then I would be able to compress the whole of stack overflow into a single integer (since the entire website could be serialized into a sequence of numbers, albeit a very long one!)
Unfortunately, mathematics doesn't work that way. Any given sequence has a particular measure of entropy - the average amount of complexity in that sequence. In order to reproduce that sequence losslessly, you must be able to encode at least enough information to represent its entropy.
For certain sequences, there may in fact be a seed that is capable of generating a long, specific sequence, but that is only because there is a hard-coded mathematical function which takes that seed and produces a particular sequence of numbers. However, to take an arbitrary sequence of values and produce such a seed, you would need both a seed, and a function capable of producing that sequence from that seed. In order to encode both of these things, you'd find that you've got a lot more data than you'd expect!