Can SQLite handle 90 million records?

前端 未结 8 2177
故里飘歌
故里飘歌 2020-12-04 10:50

Or should I use a different hammer to fix this problem.

I\'ve got a very simple use-case for storing data, effectively a sparse matrix, which I\'ve attempted to stor

相关标签:
8条回答
  • 2020-12-04 11:25

    Consider using a table for new inserts of the given day, without an index. Then, at the end of each day, run a script which will:

    1. Insert new values from new_table into master_table
    2. Clear the new_table for next day of processing

    If you can do lookups on historical data in O(log n), and lookups on today's data in O(n), this should provide a nice compromise.

    0 讨论(0)
  • 2020-12-04 11:34

    Since we know that capturing your data is fast when there is no index on the table, what might actually work is this:

    1. Capture the 800 values in a temporary table with no index.

    2. Copy the records to the master table (containing indexes) using the form of INSERT INTO that takes a SELECT statement.

    3. Delete the records from the temporary table.

    This technique is based on the theory that the INSERT INTO that takes a SELECT statement is faster than executing individual INSERTs.

    Step 2 can be executed in the background by using the Asynchronous Module, if it still proves to be a bit slow. This takes advantage of the bits of downtime between captures.

    0 讨论(0)
  • 2020-12-04 11:34

    I can't tell from your specs, but if the ID field is always increasing, and the time field includes YYYYMMDD for uniqueness and is also always increasing, and you're doing either ID searches or time searches, then the simplest non-database solution would be to simply append all records to a fixed-field text or binary file (since they're being generated in "sorted" order) and use code to do a binary search for the desired records (eg, find the first record with the ID or time of interest, then sequentially step through the desired range).

    0 讨论(0)
  • 2020-12-04 11:35

    I've looked at your code, and I think you might be overdoing it with the prepare and finalize statements. I am by no means an SQLite expert, but there's got to be significant overhead in preparing a statement each and every time through the loop.

    Quoting from the SQLite website:

    After a prepared statement has been evaluated by one or more calls to sqlite3_step(), it can be reset in order to be evaluated again by a call to sqlite3_reset(). Using sqlite3_reset() on an existing prepared statement rather creating a new prepared statement avoids unnecessary calls to sqlite3_prepare(). In many SQL statements, the time needed to run sqlite3_prepare() equals or exceeds the time needed by sqlite3_step(). So avoiding calls to sqlite3_prepare() can result in a significant performance improvement.

    http://www.sqlite.org/cintro.html

    In your case, rather than preparing a new statement each time, you could try binding new values to your existing statement.

    All this said, I think the indexes might be the actual culprit, since the time keeps increasing as you add more data. I am curious enough about this where I plan to do some testing over the weekend.

    0 讨论(0)
  • 2020-12-04 11:37

    Answering my own question just as a place to put some details:

    It turns out (as correctly suggested above) that the index creation is the slow step, and every time I do another transaction of inserts, the index is updated which takes some time. My solution is to: (A) create the data table (B) insert all my historical data (several years worth) (C) create the indexes

    Now all lookups etc are really fast and sqlite does a great job. Subsequent daily updates now take a few seconds to insert only 800 records, but that is no problem since it only runs every 10 minutes or so.

    Thanks to Robert Harvey and maxwellb for the help/suggestions/answers above.

    0 讨论(0)
  • 2020-12-04 11:42

    The theoretical maximum number of rows in a table is 2^64 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first. A 140 terabytes database can hold no more than approximately 1e+13 rows, and then only if there are no indices and if each row contains very little data.

    0 讨论(0)
提交回复
热议问题