I am using a MySQL DB, and have the following table:
CREATE TABLE SomeTable (
PrimaryKeyCol BIGINT(20) NOT NULL,
A BIGINT(20) NOT NULL,
FirstX INT(11) N
I found a solution that relies on properties of the data in the table. I would rather have a more general solution that doesn't depend on the current data, but for the time being that's the best I have.
The problem with the original query:
SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;
is that the execution may require scanning a large percentage of the entries in the FirstX
,LastX
,P
index when the first condition FirstX <= ?
is satisfied by a large percentage of the rows.
What I did to reduce the execution time is observe that LastX-FirstX
is relatively small.
I ran the query:
SELECT MAX(LastX-FirstX) FROM SomeTable;
and got 4200000
.
This means that FirstX >= LastX – 4200000
for all the rows in the table.
So in order to satisfy LastX >= ?
, we must also satisfy FirstX >= ? – 4200000
.
So we can add a condition to the query as follows:
SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND FirstX >= ? - 4200000 AND LastX >= ? LIMIT 10;
In the example I tested in the question, the number of index entries processed was reduced from 2104820
to 18
and the running time was reduced from 0.563 seconds to 0.0003 seconds.
I tested the new query with the same 120000
values of X
. The output was identical to the old query. The time went down from over 10 hours to 5.5 minutes, which is over 100 times faster.