Continuing a transaction after primary key violation error

前端 未结 4 1029
旧巷少年郎
旧巷少年郎 2020-11-30 07:56

I am doing a bulk insert of records into a database from a log file. Occasionally (~1 row out of every thousand) one of the rows violates the primary key and causes the tran

相关标签:
4条回答
  • 2020-11-30 08:33

    You can also use SAVEPOINTs in a transaction.

    Pythonish pseudocode is illustrate from the application side:

    database.execute("BEGIN")
    foreach data_row in input_data_dictionary:
        database.execute("SAVEPOINT bulk_savepoint")
        try:
            database.execute("INSERT", table, data_row)
        except:
            database.execute("ROLLBACK TO SAVEPOINT bulk_savepoint")
            log_error(data_row)
            error_count = error_count + 1
        else:
            database.execute("RELEASE SAVEPOINT bulk_savepoint")
    
    if error_count > error_threshold:
        database.execute("ROLLBACK")
    else:
        database.execute("COMMIT")
    

    Edit: Here's an actual example of this in action in psql based on a slight variation of the example in the documentation (SQL statements prefixed by ">"):

    > CREATE TABLE table1 (test_field INTEGER NOT NULL PRIMARY KEY);
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "table1_pkey" for table "table1"
    CREATE TABLE
    
    > BEGIN;
    BEGIN
    > INSERT INTO table1 VALUES (1);
    INSERT 0 1
    > SAVEPOINT my_savepoint;
    SAVEPOINT
    > INSERT INTO table1 VALUES (1);
    ERROR:  duplicate key value violates unique constraint "table1_pkey"
    > ROLLBACK TO SAVEPOINT my_savepoint;
    ROLLBACK
    > INSERT INTO table1 VALUES (3);
    INSERT 0 1
    > COMMIT;
    COMMIT
    > SELECT * FROM table1;  
     test_field 
    ------------
              1
              3
    (2 rows)
    

    Note that the value 3 was inserted after the error, but still inside the same transaction!

    The documentation for SAVEPOINT is at http://www.postgresql.org/docs/8.4/static/sql-savepoint.html.

    0 讨论(0)
  • 2020-11-30 08:38

    I would use a stored procedure to catch the exceptions on your unique violations. Example:

    CREATE OR REPLACE FUNCTION my_insert(i_foo text, i_bar text)
      RETURNS boolean LANGUAGE plpgsql AS
    $BODY$
    begin   
        insert into foo(x, y) values(i_foo, i_bar);
        exception
            when unique_violation THEN -- nothing
    
        return true;
    end;
    $BODY$;
    
    SELECT my_insert('value 1','another value');
    
    0 讨论(0)
  • 2020-11-30 08:40

    Or you can use SSIS and have the failed rows take a differnt path than the successful ones.

    SInce you are usinga differnt database can you bulk insert the files to a staging table and then use SQL code to select only those records which do not have an exisitng id?

    0 讨论(0)
  • 2020-11-30 08:50

    You can do a rollback to the transaction or a rollback to a save point just before the code that raises the exception (cr is the cursor):

    name = uuid.uuid1().hex
    cr.execute('SAVEPOINT "%s"' % name)
    try:
        # your failing query goes here
    except Exception:
        cr.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
        # your alternative code goes here 
    else:
        cr.execute('RELEASE SAVEPOINT "%s"' % name)
    

    This code assumes there is running transaction, otherwise you would not receive that error message.

    Django postgresql backend creates cursors directly from psycopg. Maybe in the future they make a proxy class for the Django cursor, similar to the cursor of odoo. They extend the cursor with the following code (self is the cursor):

    @contextmanager
    @check
    def savepoint(self):
        """context manager entering in a new savepoint"""
        name = uuid.uuid1().hex
        self.execute('SAVEPOINT "%s"' % name)
        try:
            yield
        except Exception:
            self.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
            raise
        else:
            self.execute('RELEASE SAVEPOINT "%s"' % name)
    

    That way the context makes your code easier, it will be:

    try:
        with cr.savepoint():
            # your failing query goes here
    except Exception:
        # your alternative code goes here 
    

    and the code is more readable, because the transaction stuff is not there.

    0 讨论(0)
提交回复
热议问题