psycopg2 COPY using cursor.copy_from() freezes with large inputs

Consider the following code in Python, using psycopg2 cursor object (Some column names were changed or omitted for clarity):

filename='data.csv'
file_columns=('id', 'node_id', 'segment_id', 'elevated', 
              'approximation', 'the_geom', 'azimuth')
self._cur.copy_from(file=open(filename),
                    table=self.new_table_name, columns=file_columns)

The database is located on a remote machine on a fast LAN.
Using \COPY from bash works very fast, even for large (~1,000,000 lines) files.

This code is ultra-fast for 5,000 lines, but when data.csv grows beyond 10,000 lines, the program freezes completely.

Any thoughts \ solutions?

Adam

This is just a workaround, but you can just pipe something into psql. I use this recipe sometimes when I am too lazy to bust out psycopg2

import subprocess
def psql_copy_from(filename, tablename, columns = None):
    """Warning, this does not properly quote things"""
    coltxt = ' (%s)' % ', '.join(columns) if columns else ''
    with open(filename) as f:
        subprocess.check_call([
            'psql',
            '-c', 'COPY %s%s FROM STDIN' % (tablename, coltxt),
            '--set=ON_ERROR_STOP=true', # to be safe
            # add your connection args here
        ], stdin=f)

As far as your locking up is concerned, are you using multiple threads or anything like that?

Is your postgres logging anything such as a closed connection or a deadlock? Can you see disk activity after it locks up?

Le Droid

That's memory limitation which makes "copy_from" crashing as open(filename) returns all the file in one shot. It's a psycopg2's problem, not Postgresql one, so Mike's solution is the best one.

There is a solution if you want to use "copy_from" with regular commits and manage duplicate keys at the same time: https://stackoverflow.com/a/11059350/1431079

来源：https://stackoverflow.com/questions/3491864/psycopg2-copy-using-cursor-copy-from-freezes-with-large-inputs

标签

python

postgresql

large-data-volumes

psycopg2