postgresql: how to get primary keys of rows inserted with a bulk copy_from?

痞子三分冷 提交于 2019-12-22 10:27:54

问题


The goal is this: I have a set of values to go into table A, and a set of values to go into table B. The values going into B reference values in A (via a foreign key), so after inserting the A values I need to know how to reference them when inserting the B values. I need this to be as fast as possible.

I made the B values insert with a bulk copy from:

def bulk_insert_copyfrom(cursor, table_name, field_names, values):
    if not values: return

    print "bulk copy from prepare..."
    str_vals = "\n".join("\t".join(adapt(val).getquoted() for val in cur_vals) for cur_vals in values)
    strf = StringIO(str_vals)
    print "bulk copy from execute..."
    cursor.copy_from(strf, table_name, columns=tuple(field_names))

This was far faster than doing an INSERT VALUES ... RETURNING id query. I'd like to do the same for the A values, but I need to know the ids of the inserted rows.

Is there any way to execute a bulk copy from in this fashion, but to get the id field (primary key) of the rows that are inserted, such that I know which id associates with which value?

If not, what would the best way to accomplish my goal?

EDIT: Sample data on request:

a_val1 = [1, 2, 3]
a_val2 = [4, 5, 6]
a_vals = [a_val1, a_val2]

b_val1 = [a_val2, 5, 6, 7]
b_val2 = [a_val1, 100, 200, 300]
b_val3 = [a_val2, 9, 14, 6]
b_vals = [b_val1, b_val2, b_val3]

I want to insert the a_vals, then insert the b_vals, using foreign keys instead of references to the list objects.


回答1:


Generate the IDs yourself.

  1. BEGIN transaction
  2. Lock table a
  3. call nextval() - that's your first ID
  4. generate your COPY with IDs in place
  5. same for table b
  6. call setval() with your final ID + 1
  7. COMMIT transaction

At step 2 you probably want to lock the sequence's relation too. If code calls nextval() and stashes that ID somewhere it might be already in use by the time it uses it.

Slightly off-topic fact: there is a "cache" setting that you can set if you have lots of backends doing lots of inserts. That increments the counter in blocks.

http://www.postgresql.org/docs/9.1/static/sql-createsequence.html




回答2:


Actually you can do it differently, what you need is:

  • Start transaction
  • Create temp table with same (or almost same) schema
  • COPY data to that temp table
  • Perform regullar INSERT INTO .. FROM temp_table ... RETURNING id, other_columns
  • Commit

taken from here (in c#, but algo is the same)



来源:https://stackoverflow.com/questions/8000689/postgresql-how-to-get-primary-keys-of-rows-inserted-with-a-bulk-copy-from

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!