I\'m trying to do something that I thought it would be simple but it seems not to be.
I have a project model that has many vacancies.
class Project &
Without much Rails magic, you can do:
Project.where('(SELECT COUNT(*) FROM vacancies WHERE vacancies.project_id = projects.id) > 0')
This type of conditions will work in all Rails versions as much of the work is done directly on the DB side. Plus, chaining .count
method will work nicely too. I've been burned by queries like Project.joins(:vacancies)
before. Of course, there are pros and cons as it's not DB agnostic.
I think there's a simpler solution:
Project.joins(:vacancies).distinct
Performing an inner join to the has_many table combined with a group
or uniq
is potentially very inefficient, and in SQL this would be better implemented as a semi-join that uses EXISTS
with a correlated subquery.
This allows the query optimiser to probe the vacancies table to check for the existence of a row with the correct project_id. It doesn't matter whether there is one row or a million that have that project_id.
That's not as straightforward in Rails, but can be achieved with:
Project.where(Vacancies.where("vacancies.project_id = projects.id").exists)
Similarly, find all projects that have no vacancies:
Project.where.not(Vacancies.where("vacancies.project_id = projects.id").exists)
Edit: in recent Rails versions you get a deprecation warning telling you to not to rely on exists
being delegated to arel. Fix this with:
Project.where.not(Vacancies.where("vacancies.project_id = projects.id").arel.exists)
Edit: if you're uncomfortable with raw SQL, try:
Project.where.not(Vacancies.where(Vacancy.arel_table[:project_id].eq(Project.arel_table[:id])).arel.exists)
You can make this less messy by adding class methods to hide the use of arel_table
, for example:
class Project
def self.id_column
arel_table[:id]
end
end
... so ...
Project.where.not(
Vacancies.where(
Vacancy.project_id_column.eq(Project.id_column)
).arel.exists
)
The error is telling you that vacancies is not a column in projects, basically.
This should work
Project.joins(:vacancies).where('COUNT(vacancies.project_id) > 0')
1) To get Projects with at least 1 vacancy:
Project.joins(:vacancies).group('projects.id')
2) To get Projects with more than 1 vacancy:
Project.joins(:vacancies).group('projects.id').having('count(project_id) > 1')
3) Or, if Vacancy
model sets counter cache:
belongs_to :project, counter_cache: true
then this will work, too:
Project.where('vacancies_count > ?', 1)
Inflection rule for vacancy
may need to be specified manually?
joins
uses an inner join by default so using Project.joins(:vacancies)
will in effect only return projects that have an associated vacancy.
UPDATE:
As pointed out by @mackskatz in the comment, without a group
clause, the code above will return duplicate projects for projects with more than one vacancies. To remove the duplicates, use
Project.joins(:vacancies).group('projects.id')
UPDATE:
As pointed out by @Tolsee, you can also use distinct
.
Project.joins(:vacancies).distinct
As an example
[10] pry(main)> Comment.distinct.pluck :article_id
=> [43, 34, 45, 55, 17, 19, 1, 3, 4, 18, 44, 5, 13, 22, 16, 6, 53]
[11] pry(main)> _.size
=> 17
[12] pry(main)> Article.joins(:comments).size
=> 45
[13] pry(main)> Article.joins(:comments).distinct.size
=> 17
[14] pry(main)> Article.joins(:comments).distinct.to_sql
=> "SELECT DISTINCT \"articles\".* FROM \"articles\" INNER JOIN \"comments\" ON \"comments\".\"article_id\" = \"articles\".\"id\""