What is the best way to find records with duplicate values across multiple columns using Postgres, and Activerecord?
I found this solution here:
User.f
That error occurs because POSTGRES requires you to put grouping columns in the SELECT clause.
try:
User.select(:first,:email).group(:first,:email).having("count(*) > 1").all
(note: not tested, you may need to tweak it)
EDITED to remove id column
Tested & Working Version
User.select(:first,:email).group(:first,:email).having("count(*) > 1")
Also, this is a little unrelated but handy. If you want to see how times each combination was found, put .size at the end:
User.select(:first,:email).group(:first,:email).having("count(*) > 1").size
and you'll get a result set back that looks like this:
{[nil, nil]=>512,
["Joe", "test@test.com"]=>23,
["Jim", "email2@gmail.com"]=>36,
["John", "email3@gmail.com"]=>21}
Thought that was pretty cool and hadn't seen it before.
Credit to Taryn, this is just a tweaked version of her answer.
Get all duplicates with a single query if you use PostgreSQL:
def duplicated_users
duplicated_ids = User
.group(:first, :email)
.having("COUNT(*) > 1")
.select('unnest((array_agg("id"))[2:])')
User.where(id: duplicated_ids)
end
irb> duplicated_users
If you need the full models, try the following (based on @newUserNameHere's answer).
User.where(email: User.select(:email).group(:email).having("count(*) > 1").select(:email))
This will return the rows where the email address of the row is not unique.
I'm not aware of a way to do this over multiple attributes.
Based on the answer above by @newUserNameHere I believe the right way to show the count for each is
res = User.select('first, email, count(1)').group(:first,:email).having('count(1) > 1')
res.each {|r| puts r.attributes } ; nil