问题
I have a query involving couples of rows which have a less-than-2-hours time-difference (~0.08333 days):
SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2
WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
This query is rather slow, i.e. ~ 1 second (the table has ~ 10k rows).
An idea was to use an INDEX
. Obviously CREATE INDEX id1 ON mytable(date)
didn't improve anything, that's normal.
Then I noticed that the magical query CREATE INDEX id2 ON mytable(JULIANDAY(date))
didn't help when using:
... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
didn't help when using:
... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
... but massively improved the performance (query time happily divided by 50 !) when using:
... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333 AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
Of course 1., 2. and 3. are equivalent since mathematically,
|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
<=> x < y + 0.08333 AND x > y - 0.08333
Question: Why are solutions 1. and 2. not making use of INDEX whereas solution 3. is using it?
Note:
I'm using Python + Sqlite
sqlite3
moduleThe fact solutions 1. and 2. are not using the index is confirmed when doing
EXPLAIN QUERY PLAN SELECT ...
:(0, 0, 0, u'SCAN TABLE mytable AS mt1') (0, 1, 1, u'SCAN TABLE mytable AS mt2')
The fact solution 3. is using the index is shown when doing
EXPLAIN QUERY PLAN SELECT ...
:(0, 0, 1, u'SCAN TABLE mytable AS mt2') (0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')
回答1:
I believe that the inclusion of AND
is the reasoning as per :
The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the OR-clause optimization is applied.
The SQLite Query Optimizer Overview
It may be worthwhile running ANALYZE to see if that improves matters.
As per the comment:
I think the previously added paragraph can clarify why ABS(x-y) < k is not using index, and why x < y + k is using it, don't you think so? Would you want to include this paragraph? [All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. To be usable by an index a term must be of one of the following forms: column = expression, column IS expression, column > expression ...
The following has been added.
To be usable by an index a term must be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression > column
expression >= column
expression < column
expression <= column
column IN (expression-list)
column IN (subquery)
column IS NULL
I'm not sure if it would work with a BETWEEN (e.g. WHERE column BETWEEN expr1 AND expr2
).
回答2:
You are using an expression index. The documentation says:
The SQLite query planner will consider using an index on an expression when the expression that is indexed appears in the WHERE clause or in the ORDER BY clause of a query, exactly as it is written in the CREATE INDEX statement. The query planner does not do algebra.
So it is not possible to use an index to speed up lookups of a call to abs()
if the indexed expression is only a parameter. (And it is not possible to index the entire abs()
call because it involves two tables.)
So converting the expression as you did is the only way to make it more efficient.
(Please note that a<b<c
compares a
and b
first, and then compares the resulting boolean value against c
. This is not what you want.)
来源:https://stackoverflow.com/questions/49887709/sql-index-not-used-on-where-absx-y-k-condition-but-used-on-y-k-x-y