Rails: How to get objects with at least one child?

后端未结

关注

 6  1107

星月不相逢 2021-01-03 21:10

After googling, browsing SO and reading, there doesn\'t seem to be a Rails-style way to efficiently get only those Parent objects which have at leas

6条回答

孤街浪徒 (楼主)

2021-01-03 21:22

The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:

Parent.where<<-SQL EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id) SQL

EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:

class Post < ApplicationRecord has_many :comments has_many :likes end class Comment < ApplicationRecord belongs_to :post end class Like < ApplicationRecord belongs_to :post end

In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:

# Create posts with comments and likes 100.times do |i| post = Post.create!(title: "Post #{i}") 50.times do |j| post.comments.create!(content: "Comment #{j} for #{post.title}") post.likes.create!(user_name: "User #{j} for #{post.title}") end end # Create a post without comment and like Post.create!(title: 'Hidden post')

If you want to get posts which have at least one comment and like, you might write like this:

# NOTE: uniq method will be removed in Rails 5.1 Post.joins(:comments, :likes).distinct

The query above generates SQL like this:

SELECT DISTINCT "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"

But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.

In this case you should write like this:

Post.where <<-SQL EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id) AND EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id) SQL

This query generates SQL like this:

SELECT "posts".* FROM "posts" WHERE ( EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id) AND EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id) )

This query does not generate useless duplicated rows, so it could be faster.

Here is benchmark:

user system total real Uniq: 0.010000 0.000000 0.010000 ( 0.074396) Exists: 0.000000 0.000000 0.000000 ( 0.003711)

It shows EXISTS is 20.047661 times faster than DISTINCT.

I pushed the sample application in GitHub, so you can confirm the difference by yourself:

https://github.com/JunichiIto/exists-query-sandbox

0 讨论(0)

查看其它6个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复