I\'m having a performance problem in SQLite with a SELECT COUNT(*) on a large tables.
As I didn\'t yet receive a usable answer and I did some further testing, I edited m
If you haven't DELETE
d any records, doing:
SELECT MAX(_ROWID_) FROM "table" LIMIT 1;
Will avoid the full-table scan. Note that _ROWID_ is a SQLite identifier.
The output for the fast queries all start with the text "QP: SEARCH". Whilst those for the slow queries start with text "QP: SCAN", which suggests that sqlite is performing a scan of the entire table in order to generate the count.
Googling for "sqlite table scan count" finds the following, which suggests that using a full table scan to retrieve a count is just the way sqlite works, and is therefore probably unavoidable.
As a workaround, and given that status has only eight values, I wondered if you could get a count quickly using a query like the following?
select 1 where status=1 union select 1 where status=2 ...
then count the rows in the result. This is clearly ugly, but it might work if it persuades sqlite to run the query as a search rather than a scan. The idea of returning "1" each time is to avoid the overhead of returning real data.
Do not count the stars, count the records! Or in other language, never issue
SELECT COUNT(*) FROM tablename;
use
SELECT COUNT(ROWID) FROM tablename;
Call EXPLAIN QUERY PLAN for both to see the difference. Make sure you have an index in place containing all columns mentioned in the WHERE clause.
I had the same problem, in my situation VACUUM command helped. After its execution on database COUNT(*) speed increased near 100 times. However, command itself needs some minutes in my database (20 millions records). I solved this problem by running VACUUM when my software exits after main window destruction, so the delay doesn't make problems to user.
Here's a potential workaround to improve the query performance. From the context, it sounds like your query takes about a minute and a half to run.
Assuming you have a date_created column (or can add one), run a query in the background each day at midnight (say at 00:05am) and persist the value somewhere along with the last_updated date it was calculated (I'll come back to that in a bit).
Then, running against your date_created column (with an index), you can avoid a full table scan by doing a query like SELECT COUNT(*) FROM TABLE WHERE date_updated > "[TODAY] 00:00:05".
Add the count value from that query to your persisted value, and you have a reasonably fast count that's generally accurate.
The only catch is that from 12:05am to 12:07am (the duration during which your total count query is running) you have a race condition which you can check the last_updated value of your full table scan count(). If it's > 24 hours old, then your incremental count query needs to pull a full day's count plus time elapsed today. If it's < 24 hours old, then your incremental count query needs to pull a partial day's count (just time elapsed today).
On the matter of the column constraint, SQLite maps columns that are declared to be INTEGER PRIMARY KEY
to the internal row id (which in turn admits a number of internal optimizations). Theoretically, it could do the same for a separately-declared primary key constraint, but it appears not to do so in practice, at least with the version of SQLite in use. (System.Data.SQLite 1.0.74.0 corresponds to core SQLite 3.7.7.1. You might want to try re-checking your figures with 1.0.79.0; you shouldn't need to change your database to do that, just the library.)