Why is the same SQLite query being 30 times slower when fetching only twice as many results?

前端未结

关注

 3  2044

I have been working on speeding up a query I\'m using for about a week now and asked several questions about it here ( How can I speed up fetching the results after running an s

相关标签:

3条回答

不知归路

2021-02-01 15:57

The execution time geometrically proportional to the number of rows in each table rather than arithmetically e.g.

3 tables with 10 rows each => 1,000 comparision

3 tables with 10, 10 and 40 rows => 4,000 comparisons

3 tables with 20 rows each => 8,000 comparisons

You could probably re-factor the query to avoid some of the joins/cursors - when do you need an answer?

Could you do something like this:

SELECT precursor_id, feature_table_id 
FROM MSMS_precursor
INNER JOIN 

    (
        SELECT mzMin, mzMax, rtMin, rtMax, spectrum_id, feature_table_id, msrun_msrun_id

        FROM spectrum
        INNER JOIN 

           (select feature_table_id, mzMin, mzMax, rtMin, rtMax, msrun_msrun_id
            from feature
            where feature.msrun_msrun_id = 'value'
           ) subquery 

        ON subquery.msrun_msrun_id = spectrum.msrun_msrun_id
        WHERE 
            spectrum.scan_start_time BETWEEN subquery.rtMin AND subquery.rtMax 
    ) subquery

    ON subquery.spectrum_id = MSMS_precursor.spectrum_spectrum_id 

WHERE 
    MSMS_precursor.ion_mz BETWEEN subquery.mzMin AND subquery.mzMax

Using a subquery enables you to reduce the number of comparisons between the tables - you can quickly filter out the unwanted features, then the un-related spectra before searching for suitable precursors.

I don't use SQLLite - but the principle should still apply.

UPDATED : fixed bug in SQL

Notes:

You don't have to worry about the ANDs, you'll only get:

features where feature.msrun_msrun_id = 'value'
spectra for those features and where spectrum.scan_start_time BETWEEN subquery.rtMin AND subquery.rtMax
precursors for those spectrs and where MSMS_precursor.ion_mz BETWEEN subquery.mzMin AND subquery.mzMax

UPDATE 18/May:

It's the indexing!!! you have indexes on the search fields, but not on the fields participating in the joins - foreign key indices really boost performance:

CREATE INDEX `fk_msrun_msrun_id_feature` ON `feature` (`msrun_msrun_id` ASC); 
CREATE INDEX `fk_spectrum_spectrum_id_feature` ON `feature` (`msrun_msrun_id` ASC); 
CREATE INDEX `fk_spectrum_spectrum_id_MSMS_precursor` ON `MSMS_precursor` (`spectrum_spectrum_id` ASC);

0 讨论(0)

醉梦人生

2021-02-01 16:08
Consider using covering indices on the tables involved in your query.

You are indeed fetching a limited amount of columns in your select statement and the corresponding inner join and where clauses. By using a covering index with the columns well ordered in it, you should get a very fast query, that is you will remove the scan table in favor of an search table using a covering index.

Try to use those indices on your tables:
```
CREATE INDEX `fk_covering_feature` ON `feature` (`msrun_msrun_id`,`mzMin`,`mzMax`,`rtMin`,`rtMax`,`feature_table_id`);
CREATE INDEX `fk_covering_spectrum` ON  `spectrum` (`msrun_msrun_id`,`scan_start_time`,`spectrum_id`);
CREATE INDEX `fk_covering_MSMS_precursor` ON  `MSMS_precursor` (`spectrum_spectrum_id`,`ion_mz`,`precursor_id`);
```
When going for speed, you should also hint the query planner to understand msrun_msrun_id is a constant to check for both feature and spectrum tables. Add the constant test in your query by putting this additional test at the end of the query (and pass spectrumFeature_InputValues twice):
```
"AND spectrum.msrun_msrun_id = ?"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2021-02-01 16:09
I suggest you try using an R*Tree index, They're designed for efficient range queries.

I haven't actually used R*Tree much, just read the documentation, but I think you might be using it incorrectly. You may want to try changing your query to use
```
WHERE convexhull_edges.rtMin <= spectrum.scan_start_time AND convexhull_edges.rtMax >= spectrum.scan_start_time AND
convexhull_edges.mzMin <= MSMS_precursor.ion_mz AND convexhull_edges.mzMax >= MSMS_precursor.ion_mz
```
which should be equivalent to your current query, but I think should be faster (You should be picking a range out of the R*Tree, rather than comparing the point to the range)
0 讨论(0)
发布评论:

提交评论
- 加载中...