Efficient way to ensure unique rows in SQLite3

后端 未结 5 1481
情话喂你
情话喂你 2021-01-31 04:20

I am using SQLite3 in one of my projects and I need to ensure that the rows that are inserted into a table are unique with regard to a combination of some of their columns. In m

5条回答
  •  一生所求
    2021-01-31 04:31

    The ON CONFLICT REPLACE clause will make SQLite delete existing rows, then insert new rows. That means that SQLite is probably going to spend some of its time

    • deleting existing rows
    • updating the indexes
    • inserting new rows
    • updating the indexes

    That's my take on it, based on SQLite documentation and reading about other database management systems. I didn't look at the source code.

    SQLite has two ways of expressing uniqueness constraints: PRIMARY KEY and UNIQUE. Both of them create an index, though.

    Now the really important stuff . . .

    It's great that you did tests. Most developers don't do that. But I think your test results are badly misleading.

    In your case, it doesn't matter how fast you can insert rows into a table that doesn't have a primary key. A table that doesn't have a primary key doesn't satisfy your basic requirements for data integrity. That means you can't rely on your database to give you the right answers.

    If it doesn't have to give the right answers, I can make it really, really fast.

    To get a meaningful timing for inserting into a table that has no key, you need to either

    • run code before inserting new data to make sure you don't violate the undeclared primary key constraint, and to make sure you update existing rows with the right values (instead of inserting), or
    • run code after inserting into that table to clean up duplicates on (Fld0, Fld2, Fld3), and to reconcile conflicts

    And, of course, the time those processes take has to be taken into account, too.

    FWIW, I did a test by running 100K SQL insert statements into your schema in transactions of 1000 statements, and it only took 30 seconds. A single transaction of 1000 insert statements, which seems to be what you expect in production, took 149 msec.

    Maybe you can speed things up by inserting into an unkeyed temporary table, then updating the keyed table from that.

提交回复
热议问题