postgresql: \copy method enter valid entries and discard exceptions

谁都会走 提交于 2019-12-20 03:25:12

问题


When entering the following command:

\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;

I get the following error:

ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"

I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?


回答1:


The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.

It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY in the first place, so it isn't supported by PostgreSQL in COPY at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION block inside a LOOP over a select from the data copied into a temporary table. This works fairly well but can be slow.

It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:

CREATE TEMPORARY TABLE stagingtable(...);

\copy stagingtable FROM 'somefile.csv'

INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;

Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY you must LOCK TABLE realtable; at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists. If so, skipping errors is absolutely the wrong approach. See:

  • How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
  • Insert, on duplicate update in PostgreSQL?
  • Postgresql - Clean way to insert records if they don't exist, update if they do
  • Can COPY be used with a function?
  • Postgresql csv importation that skips rows

... this is a much-discussed issue.




回答2:


One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.

Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.

Additionally, another useful method is to use pgloader



来源:https://stackoverflow.com/questions/20169372/postgresql-copy-method-enter-valid-entries-and-discard-exceptions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!