Bulk/batch update/upsert in PostgreSQL

前端 未结 5 1368
慢半拍i
慢半拍i 2020-11-29 00:27

I\'m writing a Django-ORM enchancement that attempts to cache models and postpone model saving until the end of the transaction. It\'s all almost done, however I came across

相关标签:
5条回答
  • 2020-11-29 01:04

    Bulk insert

    You can modify the bulk insert of three columns by Ketema:

    INSERT INTO "table" (col1, col2, col3)
      VALUES (11, 12, 13) , (21, 22, 23) , (31, 32, 33);
    

    It becomes:

    INSERT INTO "table" (col1, col2, col3)
      VALUES (unnest(array[11,21,31]), 
              unnest(array[12,22,32]), 
              unnest(array[13,23,33]))
    

    Replacing the values with placeholders:

    INSERT INTO "table" (col1, col2, col3)
      VALUES (unnest(?), unnest(?), unnest(?))
    

    You have to pass arrays or lists as arguments to this query. This means you can do huge bulk inserts without doing string concatenation (and all its hazzles and dangers: sql injection and quoting hell).

    Bulk update

    PostgreSQL has added the FROM extension to UPDATE. You can use it in this way:

    update "table" 
      set value = data_table.new_value
      from 
        (select unnest(?) as key, unnest(?) as new_value) as data_table
      where "table".key = data_table.key;
    

    The manual is missing a good explanation, but there is an example on the postgresql-admin mailing list. I tried to elaborate on it:

    create table tmp
    (
      id serial not null primary key,
      name text,
      age integer
    );
    
    insert into tmp (name,age) 
    values ('keith', 43),('leslie', 40),('bexley', 19),('casey', 6);
    
    update tmp set age = data_table.age
    from
    (select unnest(array['keith', 'leslie', 'bexley', 'casey']) as name, 
            unnest(array[44, 50, 10, 12]) as age) as data_table
    where tmp.name = data_table.name;
    

    There are also other posts on StackExchange explaining UPDATE...FROM.. using a VALUES clause instead of a subquery. They might by easier to read, but are restricted to a fixed number of rows.

    0 讨论(0)
  • 2020-11-29 01:08

    Turn off autocommit and just do one commit at the end. In plain SQL, this means issuing BEGIN at the start and COMMIT at the end. You would need to create a function in order to do an actual upsert.

    0 讨论(0)
  • 2020-11-29 01:27

    Bulk inserts can be done as such:

    INSERT INTO "table" ( col1, col2, col3)
      VALUES ( 1, 2, 3 ) , ( 3, 4, 5 ) , ( 6, 7, 8 );
    

    Will insert 3 rows.

    Multiple updating is defined by the SQL standard, but not implemented in PostgreSQL.

    Quote:

    "According to the standard, the column-list syntax should allow a list of columns to be assigned from a single row-valued expression, such as a sub-select:

    UPDATE accounts SET (contact_last_name, contact_first_name) = (SELECT last_name, first_name FROM salesmen WHERE salesmen.id = accounts.sales_id);"

    Reference: http://www.postgresql.org/docs/9.0/static/sql-update.html

    0 讨论(0)
  • 2020-11-29 01:28

    I've used 3 strategies for batch transactional work:

    1. Generate SQL statements on the fly, concatenate them with semicolons, and then submit the statements in one shot. I've done up to 100 inserts in this way, and it was quite efficient (done against Postgres).
    2. JDBC has batching capabilities built in, if configured. If you generate transactions, you can flush your JDBC statements so that they transact in one shot. This tactic requires fewer database calls, as the statements are all executed in one batch.
    3. Hibernate also supports JDBC batching along the lines of the previous example, but in this case you execute a flush() method against the Hibernate Session, not the underlying JDBC connection. It accomplishes the same thing as JDBC batching.

    Incidentally, Hibernate also supports a batching strategy in collection fetching. If you annotate a collection with @BatchSize, when fetching associations, Hibernate will use IN instead of =, leading to fewer SELECT statements to load up the collections.

    0 讨论(0)
  • 2020-11-29 01:31

    it is pretty fast to populate json into recordset (postgresql 9.3+)

    big_list_of_tuples = [
        (1, "123.45"),
        ...
        (100000, "678.90"),
    ]
    
    connection.execute("""
        UPDATE mytable
        SET myvalue = Q.myvalue
        FROM (
            SELECT (value->>0)::integer AS id, (value->>1)::decimal AS myvalue 
            FROM json_array_elements(%s)
        ) Q
        WHERE mytable.id = Q.id
        """, 
        [json.dumps(big_list_of_tuples)]
    )
    
    0 讨论(0)
提交回复
热议问题