Consider the following code in Python, using psycopg2 cursor
object (Some column names were changed or omitted for clarity):
filename='data.csv'
file_columns=('id', 'node_id', 'segment_id', 'elevated',
'approximation', 'the_geom', 'azimuth')
self._cur.copy_from(file=open(filename),
table=self.new_table_name, columns=file_columns)
- The database is located on a remote machine on a fast LAN.
- Using
\COPY
from bash works very fast, even for large (~1,000,000 lines) files.
This code is ultra-fast for 5,000 lines, but when data.csv
grows beyond 10,000 lines, the program freezes completely.
Any thoughts \ solutions?
Adam
This is just a workaround, but you can just pipe something into psql. I use this recipe sometimes when I am too lazy to bust out psycopg2
import subprocess
def psql_copy_from(filename, tablename, columns = None):
"""Warning, this does not properly quote things"""
coltxt = ' (%s)' % ', '.join(columns) if columns else ''
with open(filename) as f:
subprocess.check_call([
'psql',
'-c', 'COPY %s%s FROM STDIN' % (tablename, coltxt),
'--set=ON_ERROR_STOP=true', # to be safe
# add your connection args here
], stdin=f)
As far as your locking up is concerned, are you using multiple threads or anything like that?
Is your postgres logging anything such as a closed connection or a deadlock? Can you see disk activity after it locks up?
That's memory limitation which makes "copy_from" crashing as open(filename) returns all the file in one shot. It's a psycopg2's problem, not Postgresql one, so Mike's solution is the best one.
There is a solution if you want to use "copy_from" with regular commits and manage duplicate keys at the same time: https://stackoverflow.com/a/11059350/1431079
来源:https://stackoverflow.com/questions/3491864/psycopg2-copy-using-cursor-copy-from-freezes-with-large-inputs