I have a Rails app that processes a large (millions) number of records in a mysql database. Once it starts working, its memory use quickly grows at a speed of 50MB per secon
find_each
calls find_in_batches
with a batch size of 1000 under the hood.
All the records in the batch will be created and retained in memory as long as the batch is being processed.
If your records are large or if they consume a lot of memory via proxy collections (e.g. has_many caches all of its items anytime you use it), you can also try a smaller batch size:
Person.find_each batch_size: 100 do |person|
# whatever operation
end
You can also try manually calling GC.start
periodically (e.g. every 300 items)
I was able to figure this out myself. There are two places to change.
First, disable IdentityMap. In config/application.rb
config.active_record.identity_map = false
Second, use uncached to wrap up the loop
class MemoryTestController < ApplicationController
def go
ActiveRecord::Base.uncached do
Person.find_each do |person|
# whatever operation
end
end
end
end
Now my memory use is under control. Hope this helps other people.
As nice as ActiveRecord is, it is not the best tool for all problems. I recommend dropping down to your native database adapter and doing the work at that level.