问题
I'm writing an importer that imports data from a CSV file into a DB table. To avoid loading the whole file into memory, I'm using Smarter CSV to parse the file into chunks of 100 to load each chunk one at a time.
I'll be passing each chunk of 100 to a background job processor such as Resque or Sidekiq to import those rows in bulk.
Passing 100 rows as a job argument results in a string that's about ~5000 characters long. Does this cause any problems in general or particularly with the back-end store (e.g. Sidekiq uses Redis - does Redis allow storing a key of that length?). I don't want to import one row at a time because it creates 50,000 jobs for a 50,000 row file.
I want to know the progress of the overall import, so I planned to have each job (chunk of 100) update a DB field and increase the count by 1 when it's done (not sure of a better approach?). Since these jobs process in parrallel, is there any danger of two jobs trying to update the same field by 1 and overwriting each other? Or do DB writes lock the table so only one can write at a time?
Thanks!
回答1:
Passing 100 rows as a job argument results in a string that's about ~5000 characters long.
Redis can handle that without problems.
Since these jobs process in parallel, is there any danger of two jobs trying to update the same field by 1 and overwriting each other?
If you do read + set, then yes, it's subject to race conditions. You can leverage redis for the task and use its atomic INCR.
To avoid loading the whole file into memory, I'm using Smarter CSV to parse the file into chunks of 100
Depends on what you're doing with those rows, but 50k rows by themselves are not a great strain on memory, I'd say.
来源:https://stackoverflow.com/questions/34014770/importing-chunks-of-csv-rows-with-sidekiq-resque-etc