Rails: How to get objects with at least one child?

后端 未结 6 1107
星月不相逢
星月不相逢 2021-01-03 21:10

After googling, browsing SO and reading, there doesn\'t seem to be a Rails-style way to efficiently get only those Parent objects which have at leas

6条回答
  •  孤街浪徒
    2021-01-03 21:22

    The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:

    Parent.where<<-SQL
    EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
    SQL
    

    EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:

    class Post < ApplicationRecord
      has_many :comments
      has_many :likes
    end
    
    class Comment < ApplicationRecord
      belongs_to :post
    end
    
    class Like < ApplicationRecord
      belongs_to :post
    end
    

    In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:

    # Create posts with comments and likes
    100.times do |i|
      post = Post.create!(title: "Post #{i}")
      50.times do |j|
        post.comments.create!(content: "Comment #{j} for #{post.title}")
        post.likes.create!(user_name: "User #{j} for #{post.title}")
      end
    end
    
    # Create a post without comment and like
    Post.create!(title: 'Hidden post')
    

    If you want to get posts which have at least one comment and like, you might write like this:

    # NOTE: uniq method will be removed in Rails 5.1
    Post.joins(:comments, :likes).distinct
    

    The query above generates SQL like this:

    SELECT DISTINCT "posts".* 
    FROM "posts" 
    INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" 
    INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"
    

    But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.

    In this case you should write like this:

    Post.where <<-SQL
    EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
    AND
    EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
    SQL
    

    This query generates SQL like this:

    SELECT "posts".* 
    FROM "posts" 
    WHERE (
    EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id) 
    AND 
    EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
    )
    

    This query does not generate useless duplicated rows, so it could be faster.

    Here is benchmark:

                  user     system      total        real
    Uniq:     0.010000   0.000000   0.010000 (  0.074396)
    Exists:   0.000000   0.000000   0.000000 (  0.003711)
    

    It shows EXISTS is 20.047661 times faster than DISTINCT.

    I pushed the sample application in GitHub, so you can confirm the difference by yourself:

    https://github.com/JunichiIto/exists-query-sandbox

提交回复
热议问题