Why so many collisions with the MariaDB 10.2 RAND() function?

人盡茶涼 提交于 2021-02-05 09:41:38

问题


Note: Running MariaDB 10.2.27 on Windows Server 2012 R2 Standard.

I wanted to generate random integers for use in MariaDB so I have been experimenting with the MariaDB RAND() function. Either my expectations & understanding are way off base (definitely possible!) or the MariaDB RAND() function is not very random.

Using a BIGINT(20) column I wanted to generate random integers up to 16 digits in length, so I used this SQL: FLOOR(RAND()*9999999999999999)+1). The exact SQL I use, in a loop is:

INSERT INTO rnd_test VALUES (FLOOR(RAND()*9999999999999999)+1);

Table rnd_test has a single column which is a BIGINT(20) and is the primary ID.

With a pool of 10^16 numbers and considering the Birthday Paradox I would expect about a 50% chance of a collision after 10^8 numbers generated. Obviously there is some variance to this but every time I run the insert loop I start seeing collisions almost immediately and then repeating every 2000 or 3000 generated numbers, sometimes more often. After ~50,000 randoms have been generated I'm seeing collisions every few hundred numbers.

Thinking that maybe my understanding was wildly incorrect I adjusted my looping vb.net code to generate the random locally and then insert that into the MariaDB table. I define a new System.Random at the top of the routine and then use this to generate random numbers:

Dim r As Long = CLng(Math.Floor(rNum.NextDouble() * 9999999999999999)) + 1

Generally this works much better but still not quite as well as I would expect. It will usually run for about 100,000 iterations before a collision occurs, and then after that there seems to be one or two collisions per 10,000 randoms generated. Sometimes a batch of 10,000 will go by without any collisions at all.

So, why is the MariaDB RAND() function performing so poorly compared to the vb.net function?


回答1:


Further experiments with 5.6's RAND indicate that it is good to only 30 bits. That is it has only about one billion distinct values.

File a bug with bugs.mysql.com and/or MariaDB.

30 bits is good enough for most applications. Of those applications for which it is not good enough, most people won't notice its wimpiness.

FLOAT has 24 bits of precision; DOUBLE has 53. So 30 needs DOUBLE, but fails to fill it up.



来源:https://stackoverflow.com/questions/58420671/why-so-many-collisions-with-the-mariadb-10-2-rand-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!