Two SQL statements should return the same results, but they don't (on AWS Aurora DB)

自作多情 提交于 2019-12-13 03:14:52

问题


This is the table definition for GpsPosition:

CREATE TABLE GpsPosition 
(
    altitudeInMeters SMALLINT NOT NULL,
    dateCreated      BIGINT NOT NULL,
    dateRegistered   BIGINT NOT NULL,
    deviceId         BINARY(16) NOT NULL,
    emergencyId      BINARY(16) NULL,
    gpsFix           SMALLINT NOT NULL,
    heading          SMALLINT NOT NULL,
    horizontalUncertaintyInMeters SMALLINT NOT NULL,
    id               BINARY(16) NOT NULL,
    latestForDevice  BOOLEAN NOT NULL,
    latestForUser    BOOLEAN NOT NULL,
    latitude         DOUBLE PRECISION NOT NULL,
    longitude        DOUBLE PRECISION NOT NULL,
    numSatellites    SMALLINT NOT NULL,
    speedInKmph      SMALLINT NOT NULL,
    stale            BOOLEAN NOT NULL,
    userId           BINARY(16) NULL,
    verticalUncertaintyInMeters SMALLINT NOT NULL,

    PRIMARY KEY (id)
);

ALTER TABLE GpsPosition 
    ADD CONSTRAINT GpsPosition_deviceId_fkey 
        FOREIGN KEY (deviceId) REFERENCES Device(id) 
            ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE GpsPosition 
    ADD CONSTRAINT GpsPosition_emergencyId_fkey 
        FOREIGN KEY (emergencyId) REFERENCES Emergency(id) 
            ON UPDATE CASCADE ON DELETE SET NULL;

ALTER TABLE GpsPosition 
    ADD CONSTRAINT GpsPosition_userId_fkey 
        FOREIGN KEY (userId) REFERENCES User(id) 
            ON UPDATE CASCADE ON DELETE SET NULL;

ALTER TABLE GpsPosition 
    ADD CONSTRAINT deviceId_dateCreated_must_be_unique 
        UNIQUE (deviceId, dateCreated);

CREATE INDEX i2915035553 ON GpsPosition (deviceId);
CREATE INDEX deviceId_latestForDevice_is_non_unique ON GpsPosition (deviceId, latestForDevice);
CREATE INDEX i3210815937 ON GpsPosition (emergencyId);
CREATE INDEX i1689669068 ON GpsPosition (userId);
CREATE INDEX userId_latestForUser_is_non_unique ON GpsPosition (userId, latestForUser);

Note that userId in GpsPosition is a UUID that is stored as a binary(16).

This SQL code is executing on AWS AuroraDB engine version 5.7.12.

I would expect the queries below to return the same results, but the first one returns many results and the second returns no results. Any idea as to why?

select *
from GpsPosition
where exists (select *
              from User
              where id = GpsPosition.userId and
                    id = UNHEX( '3f4163aab2ac46d6ad15164222aca89e' )
             );

select *
from GpsPosition
where userId = UNHEX( '3f4163aab2ac46d6ad15164222aca89e' );

Note that the following SQL statement returns a single row, as you would expect:

select *
from User 
where id = UNHEX( '3f4163aab2ac46d6ad15164222aca89e' );

回答1:


I see no semantic equivalence at all.

The one with exists is checking to see if a row exists in another table. If no such matching row exists, then the outer query does not return anything.

That is very different from just returning a matching row in a single table.

The observation that two queries return the same results on a particular set of data does not make them semantically equivalent. They would have to be guaranteed to return the same results on any appropriate data for the query. For instance, 2 + 2 = 2 * 2, but that doesn't make addition and multiplication "semantically equivalent."

I should also add that it is not hard to fool database optimizers, even when two expressions are guaranteed to be equivalent.




回答2:


So my team has spent literally a couple of months trying to understand this issue and many other inconsistencies (like this one in this posting) we were able to reproduce on AWS Aurora DB 5.7 but unable to reproduce on MySQL 5.7 or anything else for that matter.

As a part of this effort, we engaged AWS support, which was remarkably unhelpful. They confirmed they could reproduce the inconsistencies by executing the same queries we did on the same database we did, but then said they couldn't copy that data to another database and still reproduce the issue, and this seemed to satisfy them to mark the support case as resolved. Now granted, this is a very insidious defect since it is so difficult to reproduce and so intermittent and rare, but when it is hit, it becomes reliably reproducible within the affected data set. And once you do hit this defect, well, your applications depending on the database can no longer operate correctly in those affected areas ;)

While we do not believe the defect is limited to cascade deletes, it appears that a way to "more reliably" produce this defect is to delete rows in tables that have cascade deletes. Again, this appears to produce the defect "more reliably", but even then, it is incredibly rare and difficult to produce. We could produce it by running a huge automated test suite in a tight loop however. Again, once you actually do hit this defect, the affected data will reliably reproduce inconsistencies - it's just VERY hard to hit this defect.

So what conclusions did we draw at the end of all of our analysis?

1) First and foremost, Thorsten Kettner (see his posted comment above) is correct - this is a defect in the RDBMS server itself. We don't have access to the AWS AuroraDB source code or underlying infrastructure, and so we cannot root cause this defect to something much more specific, but it is a defect possibly in the RDBMS server, possibly in the data persistence layer, and possibly somewhere else.

2) Based upon (1) above, we decided that AWS Amazon 5.7.x is not mature enough for us to use for a production application. Even though it works correctly 99.9999% of the time, that 0.0001% was causing development and production database servers to do the wrong things and return incorrect results, which is absolutely unacceptable to us. We also detected cases where integrity constraints on the tables were not reliably honored, resulting in very strange orphaned rows that should have been deleted as a part of cascade deletes in the schema definition, which again, is absolutely unacceptable to us.

3) We were unable to reproduce any of these inconsistencies on AWS MySQL 5.6, AWS MySQL 5.7, AWS AuroraDB with MySQL 5.6 compatibility, non-AWS Windows MySQL 5.6, or non-AWS MySQL 5.7. In short, we believe that whatever is going wrong is specific to AWS AuroraDB with MySQL 5.7 compatibility. We did extensive testing on AWS AuroraDB with MySQL 5.6 compatibility in particular and could not reproduce any of these inconsistency defects, so we believe at this time that AuroraDB with MySQL 5.6 compatibility is mature and suitable for production use.



来源:https://stackoverflow.com/questions/55985502/two-sql-statements-should-return-the-same-results-but-they-dont-on-aws-aurora

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!