I have a table with a few hundred thousand lines. (It’s a precomputed table expressing the relation between lemmas of words and other big tables.) I need to do multiple selections to find a combination of different entries, i.e. I have to use “AS” to do select … from ltc as l0, ltc as l1, ltc as l2 … order by ... The speed of the query depends on the sorting: Without sorting, it’s a few milliseconds, with sorting, it can take a few minutes. This is due, as far as I can tell, to the temporary B-Tree that Sqlite builds for sorting, even though I have an index on the sorted column “nr”. I don’t understand why Sqlite does not use this index.
CREATE TABLE ltc
(nr INTEGER, lemId INTEGER, cId INTEGER, bId INTEGER,
-- UNIQUE (lemId, cId, bId),
-- if I add this uniqueness constraint, strangely enough it doesn’t use my index at all, even at a simple ORDER BY.
PRIMARY KEY(nr,lemId,cId),
FOREIGN KEY(lemId) REFERENCES lems(rowid),
FOREIGN KEY(cId) REFERENCES cs(rowid),
FOREIGN KEY(bId) REFERENCES bs(rowid) )
CREATE INDEX nri ON ltc(nr)
Here is the stripped down version of my select command:
SELECT l0.nr,l1.nr,l2.nr
FROM ltc as l0, ltc as l1, ltc as l2
WHERE
l0.lemId IN (1001) -- in reality 1001 is some simple sub select.
AND l1.lemId IN (1002,1003)
AND l2.lemId IN (1004 )
ORDER BY
l0.nr,
l1.nr,
l2.nr
LIMIT 10;
the EXPLAIN QUERY PLAN gives:
(0, 0, 0, u'SCAN TABLE ltc AS l0')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 1, 1, u'SCAN TABLE ltc AS l1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 2, 2, u'SCAN TABLE ltc AS l2')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 3')
(3, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')
and this with the ORDER BY removed (or reduced to only one column order by
l0.nr
):
(0, 0, 0, u'SCAN TABLE ltc AS l0 USING COVERING INDEX sqlite_autoindex_ltc_1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 1, 1, u'SCAN TABLE ltc AS l1 USING COVERING INDEX sqlite_autoindex_ltc_1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 2, 2, u'SCAN TABLE ltc AS l2 USING COVERING INDEX sqlite_autoindex_ltc_1')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 3')
(3, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
I have tried all sort of single and combined indeces, but it doesn’t seem to make any difference.
The problem seems to be the double ordering itself not the double selection: Even a useless double ORDER BY creates a temp b-tree (even though in this case the result is immediate):
EXPLAIN QUERY PLAN SELECT ltc.nr
FROM ltc
WHERE
ltc.lemId = 716
ORDER BY
ltc.nr,
ltc.nr
LIMIT 10;
(0, 0, 0, u'SCAN TABLE ltc')
(0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')
At SQLite ORDER BY performance issue it is said that queries cannot be ordered by indeces from different tables. Is this the problem here? Is there a way around? Is this a Sqlite specific restriction or do all SQL systems do this?
EDIT:
After adding the index, as suggested by CL, the performance problem remains. As an example take a more complete query with four search terms:
select l0.nr,l1.nr,l2.nr,l3.nr
from ltc as l0, ltc as l1, ltc as l2, ltc as l3
where
l0.lemId in (select rowid from lems where lems.lem = "catch" )
and l1.lemId in (select rowid from lems where lems.lem = "cause" )
and l2.lemId in (select rowid from lems where lems.lem = "score" )
and l3.lemId in (select rowid from lems where lems.lem = "guest" )
order by
l0.nr asc
LIMIT 10;
gives this explanation:
(0, 0, 0, u'SEARCH TABLE ltc AS l0 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 1, 1, u'SEARCH TABLE ltc AS l1 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 2, 2, u'SEARCH TABLE ltc AS l2 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 3')
(3, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 3, 3, u'SEARCH TABLE ltc AS l3 USING INDEX lid (lemId=?)')
(0, 0, 0, u'EXECUTE LIST SUBQUERY 4')
(4, 0, 0, u'SEARCH TABLE lems USING COVERING INDEX lem (lem=?)')
(0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')
(no more complete scans.)
but: time: 388 seconds!!!
when removing the order by
, I get exactly the same explanation minus the last temp B-tree!
time: 0.00025 seconds!!!
This query corresponds to some kind of join. I can also represent the query as an (inner) join (with conditions). This may be the reason that the time seems to grow exponentially with the number of search terms: {1 search term: 0.08 seconds, 2: 0.5, 3: 3, 4: 9, 5: 116, ...} But somehow, I don’t quite understand why the database can’t simply use the index on the nr column to sort. After all, it’s just a lot of results, each containing nr, that have to be ordered.
As suggested by CL, I've put the underlying problem in a new question: Selecting tuples of lines from an Sqlite table and sorting the tuples efficiently
An index can be used to speed up sorting only if the query allows to return the rows in the order in which they are stored in the index.
This is not possible when another column with a different index is used to look rows up, or when you are returning multiple rows because of a cross join.
Try adding an index on lemId
, but this is unlikely to help with the sorting.
The sorting is slow because there are too many result before the LIMIT.
来源:https://stackoverflow.com/questions/25509695/how-to-make-sqlite-use-an-index-for-ordering-on-multiple-columns-in-the-case-of