How can I run updates in batches in Rails 3/4?

后端 未结 6 1071
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-05 01:09

I need to mass-update many thousands of records, and I would like to process the updates in batches. First, I tried:

Foo.where(bar: \'bar\').find_in_batches.upda         


        
相关标签:
6条回答
  • 2021-02-05 02:00

    I'm surprised, too, that there isn't an easier way to do this... but I did come up with this approach:

    batch_size = 1000
    0.step(Foo.count, batch_size).each do |offset|
      Foo.where(bar: 'bar').order(:id)
                           .offset(offset)
                           .limit(batch_size)
                           .update_all(bar: 'baz')
    end
    

    Basically this will:

    1. Create an array of offsets between 0 and Foo.count stepping by batch_size each time. For example, if Foo.count == 10500 you'd get: [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
    2. Loop through these numbers and use them as an OFFSET in the SQL query, being sure to order by id, and limiting to the batch_size.
    3. Update at most batch_size records whose "index" is greater than offset.

    This is basically the manual way to perform what you said you were hoping for in the generated SQL. Too bad it can't just be done this way already by a standard library method... though I'm sure you could create one of your own.

    0 讨论(0)
  • 2021-02-05 02:02

    Haven't had a chance to test this yet but you might be able to use ARel and a sub query.

    Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
      Foo.where( Foo.arel_table[ :id ].in( foos.to_arel ) ).update_all(bar: 'baz')
    end
    
    0 讨论(0)
  • 2021-02-05 02:05

    I've written a small method to invoke update_all in batches:

    https://gist.github.com/VarunNatraaj/420c638d544be59eef85

    Hope it is useful! :)

    0 讨论(0)
  • 2021-02-05 02:06

    This is 2 years late, but the answers here are a) very slow for large data sets and b) ignore the builtin rails capabilities (http://api.rubyonrails.org/classes/ActiveRecord/Batches.html).

    As the offset value increases, depending on your DB server, it will do a sequence scan until it reaches your block, and then fetches the data for processing. As your offset gets into the millions, this will be extremely slow.

    use the "find_each" iterator method:

    Foo.where(a: b).find_each do |bar|
       bar.x = y
       bar.save
    end
    

    This has the added benefit of running the model callbacks with each save. If you don't care for the callbacks, then try:

    Foo.where(a: b).find_in_batches do |array_of_foo|
      ids = array_of_foo.collect &:id
      Foo.where(id: ids).update_all(x: y)
    end
    
    0 讨论(0)
  • 2021-02-05 02:12

    In Rails 5, there's a new handy method ActiveRecord::Relation#in_batches to solve this problem:

    Foo.in_batches.update_all(bar: 'baz')
    

    Check documentation for details.

    0 讨论(0)
  • 2021-02-05 02:13

    pdobb's answer is on the right track, but didn't work for me in Rails 3.2.21 because of this issue of ActiveRecord not parsing OFFSET with UPDATE calls:

    https://github.com/rails/rails/issues/10849

    I modified the code accordingly and it worked fine for concurrently setting the default value on my Postgres table:

    batch_size = 1000
    0.step(Foo.count, batch_size).each do |offset|
      Foo.where('id > ? AND id <= ?', offset, offset + batch_size).
          order(:id).
          update_all(foo: 'bar')
    end
    
    0 讨论(0)
提交回复
热议问题