I need to mass-update many thousands of records, and I would like to process the updates in batches. First, I tried:
Foo.where(bar: \'bar\').find_in_batches.upda
I'm surprised, too, that there isn't an easier way to do this... but I did come up with this approach:
batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
Foo.where(bar: 'bar').order(:id)
.offset(offset)
.limit(batch_size)
.update_all(bar: 'baz')
end
Basically this will:
0
and Foo.count
stepping by batch_size
each time. For example, if Foo.count == 10500
you'd get: [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
id
, and limiting to the batch_size
.batch_size
records whose "index" is greater than offset
.This is basically the manual way to perform what you said you were hoping for in the generated SQL. Too bad it can't just be done this way already by a standard library method... though I'm sure you could create one of your own.
Haven't had a chance to test this yet but you might be able to use ARel and a sub query.
Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
Foo.where( Foo.arel_table[ :id ].in( foos.to_arel ) ).update_all(bar: 'baz')
end
I've written a small method to invoke update_all in batches:
https://gist.github.com/VarunNatraaj/420c638d544be59eef85
Hope it is useful! :)
This is 2 years late, but the answers here are a) very slow for large data sets and b) ignore the builtin rails capabilities (http://api.rubyonrails.org/classes/ActiveRecord/Batches.html).
As the offset value increases, depending on your DB server, it will do a sequence scan until it reaches your block, and then fetches the data for processing. As your offset gets into the millions, this will be extremely slow.
use the "find_each" iterator method:
Foo.where(a: b).find_each do |bar|
bar.x = y
bar.save
end
This has the added benefit of running the model callbacks with each save. If you don't care for the callbacks, then try:
Foo.where(a: b).find_in_batches do |array_of_foo|
ids = array_of_foo.collect &:id
Foo.where(id: ids).update_all(x: y)
end
In Rails 5, there's a new handy method ActiveRecord::Relation#in_batches
to solve this problem:
Foo.in_batches.update_all(bar: 'baz')
Check documentation for details.
pdobb's answer is on the right track, but didn't work for me in Rails 3.2.21 because of this issue of ActiveRecord not parsing OFFSET with UPDATE calls:
https://github.com/rails/rails/issues/10849
I modified the code accordingly and it worked fine for concurrently setting the default value on my Postgres table:
batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
Foo.where('id > ? AND id <= ?', offset, offset + batch_size).
order(:id).
update_all(foo: 'bar')
end