问题
Note: Running MariaDB 10.2.27 on Windows Server 2012 R2 Standard.
I wanted to generate random integers for use in MariaDB so I have been experimenting with the MariaDB RAND() function. Either my expectations & understanding are way off base (definitely possible!) or the MariaDB RAND() function is not very random.
Using a BIGINT(20) column I wanted to generate random integers up to 16 digits in length, so I used this SQL: FLOOR(RAND()*9999999999999999)+1)
. The exact SQL I use, in a loop is:
INSERT INTO rnd_test VALUES (FLOOR(RAND()*9999999999999999)+1);
Table rnd_test has a single column which is a BIGINT(20) and is the primary ID.
With a pool of 10^16 numbers and considering the Birthday Paradox I would expect about a 50% chance of a collision after 10^8 numbers generated. Obviously there is some variance to this but every time I run the insert loop I start seeing collisions almost immediately and then repeating every 2000 or 3000 generated numbers, sometimes more often. After ~50,000 randoms have been generated I'm seeing collisions every few hundred numbers.
Thinking that maybe my understanding was wildly incorrect I adjusted my looping vb.net code to generate the random locally and then insert that into the MariaDB table. I define a new System.Random at the top of the routine and then use this to generate random numbers:
Dim r As Long = CLng(Math.Floor(rNum.NextDouble() * 9999999999999999)) + 1
Generally this works much better but still not quite as well as I would expect. It will usually run for about 100,000 iterations before a collision occurs, and then after that there seems to be one or two collisions per 10,000 randoms generated. Sometimes a batch of 10,000 will go by without any collisions at all.
So, why is the MariaDB RAND() function performing so poorly compared to the vb.net function?
回答1:
Further experiments with 5.6's RAND indicate that it is good to only 30 bits. That is it has only about one billion distinct values.
File a bug with bugs.mysql.com and/or MariaDB.
30 bits is good enough for most applications. Of those applications for which it is not good enough, most people won't notice its wimpiness.
FLOAT
has 24 bits of precision; DOUBLE
has 53. So 30 needs DOUBLE
, but fails to fill it up.
来源:https://stackoverflow.com/questions/58420671/why-so-many-collisions-with-the-mariadb-10-2-rand-function