Improve INSERT-per-second performance of SQLite

后端 未结 10 2226
忘掉有多难
忘掉有多难 2020-11-21 04:41

Optimizing SQLite is tricky. Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second!

Background:

10条回答
  •  既然无缘
    2020-11-21 05:44

    After reading this tutorial, I tried to implement it to my program.

    I have 4-5 files that contain addresses. Each file has approx 30 million records. I am using the same configuration that you are suggesting but my number of INSERTs per second is way low (~10.000 records per sec).

    Here is where your suggestion fails. You use a single transaction for all the records and a single insert with no errors/fails. Let's say that you are splitting each record into multiple inserts on different tables. What happens if the record is broken?

    The ON CONFLICT command does not apply, cause if you have 10 elements in a record and you need each element inserted to a different table, if element 5 gets a CONSTRAINT error, then all previous 4 inserts need to go too.

    So here is where the rollback comes. The only issue with the rollback is that you lose all your inserts and start from the top. How can you solve this?

    My solution was to use multiple transactions. I begin and end a transaction every 10.000 records (Don't ask why that number, it was the fastest one I tested). I created an array sized 10.000 and insert the successful records there. When the error occurs, I do a rollback, begin a transaction, insert the records from my array, commit and then begin a new transaction after the broken record.

    This solution helped me bypass the issues I have when dealing with files containing bad/duplicate records (I had almost 4% bad records).

    The algorithm I created helped me reduce my process by 2 hours. Final loading process of file 1hr 30m which is still slow but not compared to the 4hrs that it initially took. I managed to speed the inserts from 10.000/s to ~14.000/s

    If anyone has any other ideas on how to speed it up, I am open to suggestions.

    UPDATE:

    In Addition to my answer above, you should keep in mind that inserts per second depending on the hard drive you are using too. I tested it on 3 different PCs with different hard drives and got massive differences in times. PC1 (1hr 30m), PC2 (6hrs) PC3 (14hrs), so I started wondering why would that be.

    After two weeks of research and checking multiple resources: Hard Drive, Ram, Cache, I found out that some settings on your hard drive can affect the I/O rate. By clicking properties on your desired output drive you can see two options in the general tab. Opt1: Compress this drive, Opt2: Allow files of this drive to have contents indexed.

    By disabling these two options all 3 PCs now take approximately the same time to finish (1hr and 20 to 40min). If you encounter slow inserts check whether your hard drive is configured with these options. It will save you lots of time and headaches trying to find the solution

提交回复
热议问题