I\'m looking for a Ruby ORM to replace ActiveRecord. I\'ve been looking at Sequel and DataMapper. They look pretty good however none of them seems to do the basic: not loading e
This code works faster than find_in_batches in ActiveRecord
id_max = table.get(:max[:id])
id_min = table.get(:min[:id])
n=1000
(0..(id_max-id_min)/n).map.each do |i|
table.filter(:id >= id_min+n*i, :id < id_min+n*(i+1)).each {|row|}
end
Maybe you can consider Ohm, that is based on Redis NoSQL store.
ActiveRecord actually has an almost transparent batch mode:
User.find_each do |user|
NewsLetter.weekly_deliver(user)
end
Sequel's Dataset#each
does yield individual rows at a time, but most database drivers will load the entire result in memory first.
If you are using Sequel's Postgres adapter, you can choose to use real cursors:
posts.use_cursor.each{|p| puts p}
This fetches 1000 rows at a time by default, but you can use an option to specify the amount of rows to grab per cursor fetch:
posts.use_cursor(:rows_per_fetch=>100).each{|p| puts p}
If you aren't using Sequel's Postgres adapter, you can use Sequel's pagination extension:
Sequel.extension :pagination
posts.order(:id).each_page(1000){|ds| ds.each{|p| puts p}}
However, like ActiveRecord's find_in_batches
/find_each
, this does separate queries, so you need to be careful if there are concurrent modifications to the dataset you are retrieving.
The reason this isn't the default in Sequel is probably the same reason it isn't the default in ActiveRecord, which is that it isn't a good default in the general case. Only queries with large result sets really need to worry about it, and most queries don't return large result sets.
At least with the Postgres adapter cursor support, it's fairly easy to make it the default for your model:
Post.dataset = Post.dataset.use_cursor
For the pagination extension, you can't really do that, but you can wrap it in a method that makes it mostly transparent.
Sequel.extension :pagination
posts.order(:id).each_page(1000) do |ds|
ds.each { |p| puts p }
end
It is very very slow on large tables!
It becomes clear, looked at the method body: http://sequel.rubyforge.org/rdoc-plugins/classes/Sequel/Dataset.html#method-i-paginate
# File lib/sequel/extensions/pagination.rb, line 11
def paginate(page_no, page_size, record_count=nil)
raise(Error, "You cannot paginate a dataset that already has a limit") if @opts[:limit]
paginated = limit(page_size, (page_no - 1) * page_size)
paginated.extend(Pagination)
paginated.set_pagination_info(page_no, page_size, record_count || count)
end