Or should I use a different hammer to fix this problem.
I\'ve got a very simple use-case for storing data, effectively a sparse matrix, which I\'ve attempted to stor
Consider using a table for new inserts of the given day, without an index. Then, at the end of each day, run a script which will:
If you can do lookups on historical data in O(log n), and lookups on today's data in O(n), this should provide a nice compromise.
Since we know that capturing your data is fast when there is no index on the table, what might actually work is this:
Capture the 800 values in a temporary table with no index.
Copy the records to the master table (containing indexes) using the form of INSERT INTO that takes a SELECT statement.
Delete the records from the temporary table.
This technique is based on the theory that the INSERT INTO that takes a SELECT statement is faster than executing individual INSERTs.
Step 2 can be executed in the background by using the Asynchronous Module, if it still proves to be a bit slow. This takes advantage of the bits of downtime between captures.
I can't tell from your specs, but if the ID field is always increasing, and the time field includes YYYYMMDD for uniqueness and is also always increasing, and you're doing either ID searches or time searches, then the simplest non-database solution would be to simply append all records to a fixed-field text or binary file (since they're being generated in "sorted" order) and use code to do a binary search for the desired records (eg, find the first record with the ID or time of interest, then sequentially step through the desired range).
I've looked at your code, and I think you might be overdoing it with the prepare
and finalize
statements. I am by no means an SQLite expert, but there's got to be significant overhead in preparing a statement each and every time through the loop.
Quoting from the SQLite website:
After a prepared statement has been evaluated by one or more calls to
sqlite3_step()
, it can be reset in order to be evaluated again by a call tosqlite3_reset()
. Usingsqlite3_reset()
on an existing prepared statement rather creating a new prepared statement avoids unnecessary calls tosqlite3_prepare()
. In many SQL statements, the time needed to runsqlite3_prepare()
equals or exceeds the time needed bysqlite3_step()
. So avoiding calls tosqlite3_prepare()
can result in a significant performance improvement.
http://www.sqlite.org/cintro.html
In your case, rather than preparing a new statement each time, you could try binding new values to your existing statement.
All this said, I think the indexes might be the actual culprit, since the time keeps increasing as you add more data. I am curious enough about this where I plan to do some testing over the weekend.
Answering my own question just as a place to put some details:
It turns out (as correctly suggested above) that the index creation is the slow step, and every time I do another transaction of inserts, the index is updated which takes some time. My solution is to: (A) create the data table (B) insert all my historical data (several years worth) (C) create the indexes
Now all lookups etc are really fast and sqlite does a great job. Subsequent daily updates now take a few seconds to insert only 800 records, but that is no problem since it only runs every 10 minutes or so.
Thanks to Robert Harvey and maxwellb for the help/suggestions/answers above.
The theoretical maximum number of rows in a table is 2^64 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first. A 140 terabytes database can hold no more than approximately 1e+13 rows, and then only if there are no indices and if each row contains very little data.