How do I efficiently do a bulk insert-or-update with SQLAlchemy?

后端 未结 2 1619
独厮守ぢ
独厮守ぢ 2021-02-09 03:45

I\'m using SQLAlchemy with a Postgres backend to do a bulk insert-or-update. To try to improve performance, I\'m attempting to commit only once every thousand rows or so:

<
2条回答
  •  臣服心动
    2021-02-09 03:58

    This error is from PostgreSQL. PostgreSQL doesn't allow you to execute commands in the same transaction if one command creates an error. To fix this you can use nested transactions (implemented using SQL savepoints) via conn.begin_nested(). Heres something that might work. I made the code use explicit connections, factored out the chunking part and made the code use the context manager to manage transactions correctly.

    from itertools import chain, islice
    def chunked(seq, chunksize):
        """Yields items from an iterator in chunks."""
        it = iter(seq)
        while True:
            yield chain([it.next()], islice(it, chunksize-1))
    
    conn = engine.commit()
    for chunk in chunked(records, 1000):
        with conn.begin():
            for rec in chunk:
                try:
                    with conn.begin_nested():
                         conn.execute(inserter, ...)
                except sa.exceptions.SQLError:
                    conn.execute(my_table.update(...))
    

    This still won't have stellar performance though due to nested transaction overhead. If you want better performance try to detect which rows will create errors beforehand with a select query and use executemany support (execute can take a list of dicts if all inserts use the same columns). If you need to handle concurrent updates, you'll still need to do error handling either via retrying or falling back to one by one inserts.

提交回复
热议问题