COPY_FROM throws .read() call error on bulk data on psycopg2

寵の児 提交于 2019-12-13 03:09:20

问题


I am trying to port a mongodb over to a postgres. I am using COPY_FROMto insert bulk data faster, but I keep getting the same error:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error in .read() call

My original code would try to populate the table in one go but if I split the data into batches, there's a threshold to which the error isn't thrown, which is around 3 million records per batch:

# Base case (only 1 iteration) : throws .read() call error

MAX_BATCH_SIZE = total_bulk_size
iterations = math.ceil(total_bulk_size / MAX_BATCH_SIZE)
while n_batch < iterations:
      for record in batch:
         string_io_writer.writerow(row[...])
      string_io.seek(0)
      sql = 'COPY table1 FROM stdin WITH (FORMAT csv)'
      copy_expert(sql, string_io)
      postgres.commit()

# case #2: MAX_BATCH_SIZE = 5_000_000 (processing batches of 5M records)
# throws .read() call error

# case 3: MAX_BATCH_SIZE = 3_000_000 (processing batches of 3M records)
# Table is successfully populated and no error is thrown

My goal is to populate the table as fast as possible, so separating the data into multiple batches defeats the purpose of using COPY_FROM.

I thought this could be an out-of-memory issue, but I guess the exception thrown would be different. Also, I have been tracking the memory consumption and it's not at 100% when it crashes.

Any idea?

PS: I have tried to use COPY_FROM on postgres' command line and it works for the whole data in a single batch

来源:https://stackoverflow.com/questions/56650538/copy-from-throws-read-call-error-on-bulk-data-on-psycopg2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!