问题
I am trying to port a mongodb over to a postgres. I am using COPY_FROM
to insert bulk data faster, but I keep getting the same error:
psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error in .read() call
My original code would try to populate the table in one go but if I split the data into batches, there's a threshold to which the error isn't thrown, which is around 3 million records per batch:
# Base case (only 1 iteration) : throws .read() call error
MAX_BATCH_SIZE = total_bulk_size
iterations = math.ceil(total_bulk_size / MAX_BATCH_SIZE)
while n_batch < iterations:
for record in batch:
string_io_writer.writerow(row[...])
string_io.seek(0)
sql = 'COPY table1 FROM stdin WITH (FORMAT csv)'
copy_expert(sql, string_io)
postgres.commit()
# case #2: MAX_BATCH_SIZE = 5_000_000 (processing batches of 5M records)
# throws .read() call error
# case 3: MAX_BATCH_SIZE = 3_000_000 (processing batches of 3M records)
# Table is successfully populated and no error is thrown
My goal is to populate the table as fast as possible, so separating the data into multiple batches defeats the purpose of using COPY_FROM
.
I thought this could be an out-of-memory issue, but I guess the exception thrown would be different. Also, I have been tracking the memory consumption and it's not at 100% when it crashes.
Any idea?
PS: I have tried to use COPY_FROM
on postgres' command line and it works for the whole data in a single batch
来源:https://stackoverflow.com/questions/56650538/copy-from-throws-read-call-error-on-bulk-data-on-psycopg2