Importing chunks of CSV rows with Sidekiq, Resque, etc

故事扮演 提交于 2019-12-13 05:14:04

问题


I'm writing an importer that imports data from a CSV file into a DB table. To avoid loading the whole file into memory, I'm using Smarter CSV to parse the file into chunks of 100 to load each chunk one at a time.

I'll be passing each chunk of 100 to a background job processor such as Resque or Sidekiq to import those rows in bulk.

  1. Passing 100 rows as a job argument results in a string that's about ~5000 characters long. Does this cause any problems in general or particularly with the back-end store (e.g. Sidekiq uses Redis - does Redis allow storing a key of that length?). I don't want to import one row at a time because it creates 50,000 jobs for a 50,000 row file.

  2. I want to know the progress of the overall import, so I planned to have each job (chunk of 100) update a DB field and increase the count by 1 when it's done (not sure of a better approach?). Since these jobs process in parrallel, is there any danger of two jobs trying to update the same field by 1 and overwriting each other? Or do DB writes lock the table so only one can write at a time?

Thanks!


回答1:


Passing 100 rows as a job argument results in a string that's about ~5000 characters long.

Redis can handle that without problems.

Since these jobs process in parallel, is there any danger of two jobs trying to update the same field by 1 and overwriting each other?

If you do read + set, then yes, it's subject to race conditions. You can leverage redis for the task and use its atomic INCR.

To avoid loading the whole file into memory, I'm using Smarter CSV to parse the file into chunks of 100

Depends on what you're doing with those rows, but 50k rows by themselves are not a great strain on memory, I'd say.



来源:https://stackoverflow.com/questions/34014770/importing-chunks-of-csv-rows-with-sidekiq-resque-etc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!