I am doing a bulk insert of records into a database from a log file. Occasionally (~1 row out of every thousand) one of the rows violates the primary key and causes the tran
You can also use SAVEPOINTs in a transaction.
Pythonish pseudocode is illustrate from the application side:
database.execute("BEGIN")
foreach data_row in input_data_dictionary:
database.execute("SAVEPOINT bulk_savepoint")
try:
database.execute("INSERT", table, data_row)
except:
database.execute("ROLLBACK TO SAVEPOINT bulk_savepoint")
log_error(data_row)
error_count = error_count + 1
else:
database.execute("RELEASE SAVEPOINT bulk_savepoint")
if error_count > error_threshold:
database.execute("ROLLBACK")
else:
database.execute("COMMIT")
Edit: Here's an actual example of this in action in psql based on a slight variation of the example in the documentation (SQL statements prefixed by ">"):
> CREATE TABLE table1 (test_field INTEGER NOT NULL PRIMARY KEY);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "table1_pkey" for table "table1"
CREATE TABLE
> BEGIN;
BEGIN
> INSERT INTO table1 VALUES (1);
INSERT 0 1
> SAVEPOINT my_savepoint;
SAVEPOINT
> INSERT INTO table1 VALUES (1);
ERROR: duplicate key value violates unique constraint "table1_pkey"
> ROLLBACK TO SAVEPOINT my_savepoint;
ROLLBACK
> INSERT INTO table1 VALUES (3);
INSERT 0 1
> COMMIT;
COMMIT
> SELECT * FROM table1;
test_field
------------
1
3
(2 rows)
Note that the value 3 was inserted after the error, but still inside the same transaction!
The documentation for SAVEPOINT is at http://www.postgresql.org/docs/8.4/static/sql-savepoint.html.
I would use a stored procedure to catch the exceptions on your unique violations. Example:
CREATE OR REPLACE FUNCTION my_insert(i_foo text, i_bar text)
RETURNS boolean LANGUAGE plpgsql AS
$BODY$
begin
insert into foo(x, y) values(i_foo, i_bar);
exception
when unique_violation THEN -- nothing
return true;
end;
$BODY$;
SELECT my_insert('value 1','another value');
Or you can use SSIS and have the failed rows take a differnt path than the successful ones.
SInce you are usinga differnt database can you bulk insert the files to a staging table and then use SQL code to select only those records which do not have an exisitng id?
You can do a rollback
to the transaction or a rollback to a save point just before the code that raises the exception (cr is the cursor):
name = uuid.uuid1().hex
cr.execute('SAVEPOINT "%s"' % name)
try:
# your failing query goes here
except Exception:
cr.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
# your alternative code goes here
else:
cr.execute('RELEASE SAVEPOINT "%s"' % name)
This code assumes there is running transaction, otherwise you would not receive that error message.
Django postgresql backend creates cursors directly from psycopg. Maybe in the future they make a proxy class for the Django cursor, similar to the cursor of odoo. They extend the cursor with the following code (self is the cursor):
@contextmanager
@check
def savepoint(self):
"""context manager entering in a new savepoint"""
name = uuid.uuid1().hex
self.execute('SAVEPOINT "%s"' % name)
try:
yield
except Exception:
self.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
raise
else:
self.execute('RELEASE SAVEPOINT "%s"' % name)
That way the context makes your code easier, it will be:
try:
with cr.savepoint():
# your failing query goes here
except Exception:
# your alternative code goes here
and the code is more readable, because the transaction stuff is not there.