Subquery's rand() column re-evaluated for every repeated selection in MySQL 5.7/8.0 vs MySQL 5.6

[亡魂溺海] 提交于 2019-12-09 13:46:18

问题


I am doing a subquery in which I have a calculated column involving random number generation. In the base query I select this column twice. MySQL 5.6 works as I expect, the calculated value being called once and fixed. The 5.7+/8.0+ execution seems to re-evaluate the subquery's column value individually for each selection. Is this correct behavior? What can I do to force it work as expected in newer versions of MySQL?

CREATE TABLE t (
  `id` BIGINT(20) NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=InnoDB;

insert into t values();
insert into t values();
insert into t values();
insert into t values();
insert into t values();

SELECT  
        q.i,
        q.r,
        q.r
FROM    (
        SELECT  
                id AS i,
                (FLOOR(RAND(100) * 4)) AS r
        FROM t
        ) q;

MySQL 5.6 yields (values are the same):

+---+-----+-----+
| i |  r  |  r  |
+---+-----+-----+
| 1 |   0 |   0 |
| 2 |   2 |   2 |
| 3 |   3 |   3 |
| 4 |   2 |   2 |
| 5 |   1 |   1 |
+---+-----+-----+

while 5.7 yields (values are different):

+---+-----+-----+
| i |  r  |  r  |
+---+-----+-----+
| 1 |   0 |   2 |
| 2 |   3 |   2 |
| 3 |   1 |   1 |
| 4 |   2 |   1 |
| 5 |   2 |   0 |
+---+-----+-----+

回答1:


As mentioned in The MySQL 8.0.0 Milestone Release is available,

In MySQL 5.6 and earlier, derived tables were always materialized. In 5.7, derived tables are merged into the outer query in most cases, and materialized in some cases.

...

Enabling merging a derived table or view through a optimizer hint (WL#9307) — This work by Guilhem Bichot allows users to control whether a derived table or view will be merged or materialized using the “merge” and “no_merge” hints.

I suppose that this is the cause of the behavior I am observing in newer versions of MySQL. The mentioned hint can be used with MySQL 8.0 to force RAND() be called only once:

SELECT  /* NO_MERGE(q) */
        q.i,
        q.r,
        q.r
FROM    (
        SELECT 
                id AS i,
                (FLOOR(RAND(100) * 4)) AS r
        FROM t
        ) AS q;

+---+-----+-----+
| i |  r  |  r  |
+---+-----+-----+
| 1 |   0 |   0 |
| 2 |   2 |   2 |
| 3 |   3 |   3 |
| 4 |   2 |   2 |
| 5 |   1 |   1 |
+---+-----+-----+

This however is not available in 5.7. To achieve the desired behavior with 5.7, add LIMIT <a very high number> to the derived table definition (I'm using signed LONG_MAX below). Thanks to Roy Lyseng for this workaround.

SELECT
        q.i,
        q.r,
        q.r
FROM    (
        SELECT 
                id AS i,
                (FLOOR(RAND(100) * 4)) AS r
        FROM t LIMIT 9223372036854775807
        ) AS q;

+---+-----+-----+
| i |  r  |  r  |
+---+-----+-----+
| 1 |   0 |   0 |
| 2 |   2 |   2 |
| 3 |   3 |   3 |
| 4 |   2 |   2 |
| 5 |   1 |   1 |
+---+-----+-----+

As philipxy mentioned in the comment, the result of a query expression must be strictly defined regardless of any optimizations being applied. Which means it is an optimizer bug in MySQL 5.7/8.0.



来源:https://stackoverflow.com/questions/44336391/subquerys-rand-column-re-evaluated-for-every-repeated-selection-in-mysql-5-7

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!