How do I build a query in Ruby on Rails that joins on the max of a has_many relation only and includes a select filter on that relation?

前端未结

关注

 7  1795

I\'m struggling how to have Ruby on Rails do this query right... in short: to join on a has_many relation but only via the most recent record in that r

相关标签:

7条回答

北荒

2021-01-06 13:16

The simplest solution (based on code complexity) I can think of is first fetching the employment ids with their maximum values, then compsing a new query with the result.

attributes = %i[employee_id created_at]
employments = Employment.group(:employee_id).maximum(:created_at)
              .map { |values| Employee.where(attributes.zip(values).to_h) }
              .reduce(Employment.none, :or)
              .where(status: :inactive)

employees = Employee.where(id: employments.select(:employee_id))

This should produce the following SQL:

SELECT employments.employee_id, MAX(employments.created_at)
FROM employments
GROUP BY employments.employee_id

With the result the following query is build:

SELECT employees.*
FROM employees
WHERE employees.id IN (
  SELECT employments.employee_id 
  FROM employments
  WHERE (
    employments.employee_id = ? AND employments.created_at = ?
    OR employments.employee_id = ? AND employments.created_at = ?
    OR employments.employee_id = ? AND employments.created_at = ?
    -- ...
  ) AND employments.status = 'inactive'
)

The above method doesn't hold up well for large amounts of records, since the query grows for each additional employee. It becomes a lot easier when we can assume the higher id is made last. In that scenario the following would do the trick:

employment_ids = Employment.select(Employment.arel_table[:id].maxiumum).group(:employee_id)
employee_ids = Employment.select(:employee_id).where(id: employment_ids, status: :inactive)
employees = Employee.where(id: employee_ids)

This should produce a single query when employees is loaded.

SELECT employees.*
FROM employees
WHERE employees.id IN (
  SELECT employments.employee_id 
  FROM employments
  WHERE employments.id IN (
    SELECT MAX(employments.id)
    FROM employments
    GROUP BY employments.employee_id
  ) AND employments.status = 'inactive'
)

This solution works a lot better with larger datasets but you might want to look into the answer of max for better lookup performance.

0 讨论(0)

伪装坚强ぢ

2021-01-06 13:18

One alternative is to use a LATERAL JOIN which is a Postgres 9.3+ specific feature which can be described as something like a SQL foreach loop.

class Employee < ApplicationRecord
  has_many :employments
  def self.in_active_employment
    lat_query = Employment.select(:status)
                      .where('employee_id = employees.id') # lateral reference
                      .order(created_at: :desc)
                      .limit(1)
    joins("JOIN LATERAL(#{lat_query.to_sql}) ce ON true")
      .where(ce: { status: 'active' })
  end
end

This fetches the latest row from employments and then uses this in the WHERE clause to filter the rows from employees.

SELECT "employees".* FROM "employees" 
JOIN LATERAL(
  SELECT "employments"."status" 
  FROM "employments" 
  WHERE (employee_id = employees.id) 
  ORDER BY "employments"."created_at" DESC 
  LIMIT 1
) ce  ON true 
WHERE "ce"."status" = $1 LIMIT $2

This is going to be extremely fast in comparison to a WHERE id IN subquery if the data set is large. Of course the cost is limited portability.

0 讨论(0)

忘掉有多难

2021-01-06 13:25
+1 to @max's answer.

An alternative though is to add a start_date and end_date attribute to Employment. To get active employees, you can do
```
Employee
  .joins(:employments)
  .where('end_date is NULL OR ? BETWEEN start_date AND end_date', Date.today)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
执笔经年

2021-01-06 13:29

In my opinion you can get those max dates first to sure not getting old records and then just filter for the required status. Here was the example of doing first part of it

https://stackoverflow.com/a/18222124/10057981

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2021-01-06 13:33
After fiddling for a while (and trying all these suggestions you all came up with, plus some others), I came up with this. It works, but maybe isn't the most elegant.
```
inner_query = Employment.select('distinct on(employee_id) *').order('employee_id').order('created_at DESC')
employee_ids = Employee.from("(#{inner_query.to_sql}) as unique_employments").select("unique_employments.employee_id").where("unique_employments.status='inactive'")
employees = Employee.where(id: employee_ids)
```
The inner query returns a collection of unique employments... the latest for each employee. Then based on that I pull the employee IDs that match the status. And last, find those employee records from the IDs

I don't love it, but it's understandable and does work.

I really appreciate all the input.

One big take-away for me (and anyone else who lands across this same/similar problem): max's answer helped me realize the struggle I was having with this code is a "smell" that the data isn't modeled in an ideal way. Per max's suggestion, if the Employee table has a reference to the latest Employment, and that's kept up-to-date and accurate, then this becomes trivially easy and fast.

Food for thought.
0 讨论(0)
发布评论:

提交评论
- 加载中...

青春惊慌失措

2021-01-06 13:34

Since the title includes ARel. The following should work for your example:

employees = Employee.arel_table
employments = Employment.arel_table
max_employments = Arel::Table.new('max_employments')
e2 = employments.project(
      employments['employee_id'], 
      employments['id'].maximum.as('max_id')
     ).group(employments['employee_id'])
me_alias = Arel::Nodes::As.new(e2,max_employments)

res = employees.project(Arel.star)
      .join(me_alias).on(max_employments['employee_id'].eq(employees['id'])).
      .join(employments).on(employments['id'].eq(max_employments['max_id']))


Employee.joins(*res.join_sources)
  .where(employments: {status: :inactive})

This should result in the following

SELECT employees.* 
FROM employees 
INNER JOIN (
    SELECT 
       employments.employee_id, 
       MAX(employments.id) AS max_id 
    FROM employments 
    GROUP BY employments.employee_id
    ) AS max_employments ON max_employments.employee_id = employees.id 
INNER JOIN employments ON employments.id = max_employments.max_id
WHERE 
  employments.status = 'inactive'

0 讨论(0)

1 2 下一页