Find rows with multiple duplicate fields with Active Record, Rails & Postgres

前端 未结 5 1017
有刺的猬
有刺的猬 2020-12-04 06:07

What is the best way to find records with duplicate values across multiple columns using Postgres, and Activerecord?

I found this solution here:

User.f

相关标签:
5条回答
  • 2020-12-04 06:48

    That error occurs because POSTGRES requires you to put grouping columns in the SELECT clause.

    try:

    User.select(:first,:email).group(:first,:email).having("count(*) > 1").all
    

    (note: not tested, you may need to tweak it)

    EDITED to remove id column

    0 讨论(0)
  • 2020-12-04 06:57

    Tested & Working Version

    User.select(:first,:email).group(:first,:email).having("count(*) > 1")
    

    Also, this is a little unrelated but handy. If you want to see how times each combination was found, put .size at the end:

    User.select(:first,:email).group(:first,:email).having("count(*) > 1").size
    

    and you'll get a result set back that looks like this:

    {[nil, nil]=>512,
     ["Joe", "test@test.com"]=>23,
     ["Jim", "email2@gmail.com"]=>36,
     ["John", "email3@gmail.com"]=>21}
    

    Thought that was pretty cool and hadn't seen it before.

    Credit to Taryn, this is just a tweaked version of her answer.

    0 讨论(0)
  • 2020-12-04 06:58

    Get all duplicates with a single query if you use PostgreSQL:

    def duplicated_users
      duplicated_ids = User
        .group(:first, :email)
        .having("COUNT(*) > 1")
        .select('unnest((array_agg("id"))[2:])')
    
      User.where(id: duplicated_ids)
    end
    
    irb> duplicated_users
    
    0 讨论(0)
  • 2020-12-04 07:02

    If you need the full models, try the following (based on @newUserNameHere's answer).

    User.where(email: User.select(:email).group(:email).having("count(*) > 1").select(:email))
    

    This will return the rows where the email address of the row is not unique.

    I'm not aware of a way to do this over multiple attributes.

    0 讨论(0)
  • 2020-12-04 07:04

    Based on the answer above by @newUserNameHere I believe the right way to show the count for each is

    res = User.select('first, email, count(1)').group(:first,:email).having('count(1) > 1')
    
    res.each {|r| puts r.attributes } ; nil
    
    0 讨论(0)
提交回复
热议问题