How do I build a query in Ruby on Rails that joins on the max of a has_many relation only and includes a select filter on that relation?

前端 未结 7 1795
走了就别回头了
走了就别回头了 2021-01-06 12:56

I\'m struggling how to have Ruby on Rails do this query right... in short: to join on a has_many relation but only via the most recent record in that r

相关标签:
7条回答
  • 2021-01-06 13:16

    The simplest solution (based on code complexity) I can think of is first fetching the employment ids with their maximum values, then compsing a new query with the result.

    attributes = %i[employee_id created_at]
    employments = Employment.group(:employee_id).maximum(:created_at)
                  .map { |values| Employee.where(attributes.zip(values).to_h) }
                  .reduce(Employment.none, :or)
                  .where(status: :inactive)
    
    employees = Employee.where(id: employments.select(:employee_id))
    

    This should produce the following SQL:

    SELECT employments.employee_id, MAX(employments.created_at)
    FROM employments
    GROUP BY employments.employee_id
    

    With the result the following query is build:

    SELECT employees.*
    FROM employees
    WHERE employees.id IN (
      SELECT employments.employee_id 
      FROM employments
      WHERE (
        employments.employee_id = ? AND employments.created_at = ?
        OR employments.employee_id = ? AND employments.created_at = ?
        OR employments.employee_id = ? AND employments.created_at = ?
        -- ...
      ) AND employments.status = 'inactive'
    )
    

    The above method doesn't hold up well for large amounts of records, since the query grows for each additional employee. It becomes a lot easier when we can assume the higher id is made last. In that scenario the following would do the trick:

    employment_ids = Employment.select(Employment.arel_table[:id].maxiumum).group(:employee_id)
    employee_ids = Employment.select(:employee_id).where(id: employment_ids, status: :inactive)
    employees = Employee.where(id: employee_ids)
    

    This should produce a single query when employees is loaded.

    SELECT employees.*
    FROM employees
    WHERE employees.id IN (
      SELECT employments.employee_id 
      FROM employments
      WHERE employments.id IN (
        SELECT MAX(employments.id)
        FROM employments
        GROUP BY employments.employee_id
      ) AND employments.status = 'inactive'
    )
    

    This solution works a lot better with larger datasets but you might want to look into the answer of max for better lookup performance.

    0 讨论(0)
  • 2021-01-06 13:18

    One alternative is to use a LATERAL JOIN which is a Postgres 9.3+ specific feature which can be described as something like a SQL foreach loop.

    class Employee < ApplicationRecord
      has_many :employments
      def self.in_active_employment
        lat_query = Employment.select(:status)
                          .where('employee_id = employees.id') # lateral reference
                          .order(created_at: :desc)
                          .limit(1)
        joins("JOIN LATERAL(#{lat_query.to_sql}) ce ON true")
          .where(ce: { status: 'active' })
      end
    end
    

    This fetches the latest row from employments and then uses this in the WHERE clause to filter the rows from employees.

    SELECT "employees".* FROM "employees" 
    JOIN LATERAL(
      SELECT "employments"."status" 
      FROM "employments" 
      WHERE (employee_id = employees.id) 
      ORDER BY "employments"."created_at" DESC 
      LIMIT 1
    ) ce  ON true 
    WHERE "ce"."status" = $1 LIMIT $2 
    

    This is going to be extremely fast in comparison to a WHERE id IN subquery if the data set is large. Of course the cost is limited portability.

    0 讨论(0)
  • 2021-01-06 13:25

    +1 to @max's answer.

    An alternative though is to add a start_date and end_date attribute to Employment. To get active employees, you can do

    Employee
      .joins(:employments)
      .where('end_date is NULL OR ? BETWEEN start_date AND end_date', Date.today)
    
    0 讨论(0)
  • 2021-01-06 13:29

    In my opinion you can get those max dates first to sure not getting old records and then just filter for the required status. Here was the example of doing first part of it

    https://stackoverflow.com/a/18222124/10057981

    0 讨论(0)
  • 2021-01-06 13:33

    After fiddling for a while (and trying all these suggestions you all came up with, plus some others), I came up with this. It works, but maybe isn't the most elegant.

    inner_query = Employment.select('distinct on(employee_id) *').order('employee_id').order('created_at DESC')
    employee_ids = Employee.from("(#{inner_query.to_sql}) as unique_employments").select("unique_employments.employee_id").where("unique_employments.status='inactive'")
    employees = Employee.where(id: employee_ids)
    

    The inner query returns a collection of unique employments... the latest for each employee. Then based on that I pull the employee IDs that match the status. And last, find those employee records from the IDs

    I don't love it, but it's understandable and does work.

    I really appreciate all the input.

    One big take-away for me (and anyone else who lands across this same/similar problem): max's answer helped me realize the struggle I was having with this code is a "smell" that the data isn't modeled in an ideal way. Per max's suggestion, if the Employee table has a reference to the latest Employment, and that's kept up-to-date and accurate, then this becomes trivially easy and fast.

    Food for thought.

    0 讨论(0)
  • 2021-01-06 13:34

    Since the title includes ARel. The following should work for your example:

    employees = Employee.arel_table
    employments = Employment.arel_table
    max_employments = Arel::Table.new('max_employments')
    e2 = employments.project(
          employments['employee_id'], 
          employments['id'].maximum.as('max_id')
         ).group(employments['employee_id'])
    me_alias = Arel::Nodes::As.new(e2,max_employments)
    
    res = employees.project(Arel.star)
          .join(me_alias).on(max_employments['employee_id'].eq(employees['id'])).
          .join(employments).on(employments['id'].eq(max_employments['max_id']))
    
    
    Employee.joins(*res.join_sources)
      .where(employments: {status: :inactive})
    

    This should result in the following

    SELECT employees.* 
    FROM employees 
    INNER JOIN (
        SELECT 
           employments.employee_id, 
           MAX(employments.id) AS max_id 
        FROM employments 
        GROUP BY employments.employee_id
        ) AS max_employments ON max_employments.employee_id = employees.id 
    INNER JOIN employments ON employments.id = max_employments.max_id
    WHERE 
      employments.status = 'inactive'
    
    0 讨论(0)
提交回复
热议问题