I\'m struggling how to have Ruby on Rails do this query right... in short: to join on a has_many
relation but only via the most recent record in that r
The simplest solution (based on code complexity) I can think of is first fetching the employment ids with their maximum values, then compsing a new query with the result.
attributes = %i[employee_id created_at]
employments = Employment.group(:employee_id).maximum(:created_at)
.map { |values| Employee.where(attributes.zip(values).to_h) }
.reduce(Employment.none, :or)
.where(status: :inactive)
employees = Employee.where(id: employments.select(:employee_id))
This should produce the following SQL:
SELECT employments.employee_id, MAX(employments.created_at)
FROM employments
GROUP BY employments.employee_id
With the result the following query is build:
SELECT employees.*
FROM employees
WHERE employees.id IN (
SELECT employments.employee_id
FROM employments
WHERE (
employments.employee_id = ? AND employments.created_at = ?
OR employments.employee_id = ? AND employments.created_at = ?
OR employments.employee_id = ? AND employments.created_at = ?
-- ...
) AND employments.status = 'inactive'
)
The above method doesn't hold up well for large amounts of records, since the query grows for each additional employee. It becomes a lot easier when we can assume the higher id is made last. In that scenario the following would do the trick:
employment_ids = Employment.select(Employment.arel_table[:id].maxiumum).group(:employee_id)
employee_ids = Employment.select(:employee_id).where(id: employment_ids, status: :inactive)
employees = Employee.where(id: employee_ids)
This should produce a single query when employees
is loaded.
SELECT employees.*
FROM employees
WHERE employees.id IN (
SELECT employments.employee_id
FROM employments
WHERE employments.id IN (
SELECT MAX(employments.id)
FROM employments
GROUP BY employments.employee_id
) AND employments.status = 'inactive'
)
This solution works a lot better with larger datasets but you might want to look into the answer of max for better lookup performance.
One alternative is to use a LATERAL JOIN which is a Postgres 9.3+ specific feature which can be described as something like a SQL foreach loop.
class Employee < ApplicationRecord
has_many :employments
def self.in_active_employment
lat_query = Employment.select(:status)
.where('employee_id = employees.id') # lateral reference
.order(created_at: :desc)
.limit(1)
joins("JOIN LATERAL(#{lat_query.to_sql}) ce ON true")
.where(ce: { status: 'active' })
end
end
This fetches the latest row from employments and then uses this in the WHERE clause to filter the rows from employees.
SELECT "employees".* FROM "employees"
JOIN LATERAL(
SELECT "employments"."status"
FROM "employments"
WHERE (employee_id = employees.id)
ORDER BY "employments"."created_at" DESC
LIMIT 1
) ce ON true
WHERE "ce"."status" = $1 LIMIT $2
This is going to be extremely fast in comparison to a WHERE id IN subquery
if the data set is large. Of course the cost is limited portability.
+1 to @max's answer.
An alternative though is to add a start_date
and end_date
attribute to Employment
. To get active employees, you can do
Employee
.joins(:employments)
.where('end_date is NULL OR ? BETWEEN start_date AND end_date', Date.today)
In my opinion you can get those max dates first to sure not getting old records and then just filter for the required status. Here was the example of doing first part of it
https://stackoverflow.com/a/18222124/10057981
After fiddling for a while (and trying all these suggestions you all came up with, plus some others), I came up with this. It works, but maybe isn't the most elegant.
inner_query = Employment.select('distinct on(employee_id) *').order('employee_id').order('created_at DESC')
employee_ids = Employee.from("(#{inner_query.to_sql}) as unique_employments").select("unique_employments.employee_id").where("unique_employments.status='inactive'")
employees = Employee.where(id: employee_ids)
The inner query returns a collection of unique employments... the latest for each employee. Then based on that I pull the employee IDs that match the status. And last, find those employee records from the IDs
I don't love it, but it's understandable and does work.
I really appreciate all the input.
One big take-away for me (and anyone else who lands across this same/similar problem): max's answer helped me realize the struggle I was having with this code is a "smell" that the data isn't modeled in an ideal way. Per max's suggestion, if the Employee
table has a reference to the latest Employment
, and that's kept up-to-date and accurate, then this becomes trivially easy and fast.
Food for thought.
Since the title includes ARel
. The following should work for your example:
employees = Employee.arel_table
employments = Employment.arel_table
max_employments = Arel::Table.new('max_employments')
e2 = employments.project(
employments['employee_id'],
employments['id'].maximum.as('max_id')
).group(employments['employee_id'])
me_alias = Arel::Nodes::As.new(e2,max_employments)
res = employees.project(Arel.star)
.join(me_alias).on(max_employments['employee_id'].eq(employees['id'])).
.join(employments).on(employments['id'].eq(max_employments['max_id']))
Employee.joins(*res.join_sources)
.where(employments: {status: :inactive})
This should result in the following
SELECT employees.*
FROM employees
INNER JOIN (
SELECT
employments.employee_id,
MAX(employments.id) AS max_id
FROM employments
GROUP BY employments.employee_id
) AS max_employments ON max_employments.employee_id = employees.id
INNER JOIN employments ON employments.id = max_employments.max_id
WHERE
employments.status = 'inactive'