How do I build a query in Ruby on Rails that joins on the max of a has_many relation only and includes a select filter on that relation?

前端 未结 7 1802
走了就别回头了
走了就别回头了 2021-01-06 12:56

I\'m struggling how to have Ruby on Rails do this query right... in short: to join on a has_many relation but only via the most recent record in that r

7条回答
  •  北荒
    北荒 (楼主)
    2021-01-06 13:16

    The simplest solution (based on code complexity) I can think of is first fetching the employment ids with their maximum values, then compsing a new query with the result.

    attributes = %i[employee_id created_at]
    employments = Employment.group(:employee_id).maximum(:created_at)
                  .map { |values| Employee.where(attributes.zip(values).to_h) }
                  .reduce(Employment.none, :or)
                  .where(status: :inactive)
    
    employees = Employee.where(id: employments.select(:employee_id))
    

    This should produce the following SQL:

    SELECT employments.employee_id, MAX(employments.created_at)
    FROM employments
    GROUP BY employments.employee_id
    

    With the result the following query is build:

    SELECT employees.*
    FROM employees
    WHERE employees.id IN (
      SELECT employments.employee_id 
      FROM employments
      WHERE (
        employments.employee_id = ? AND employments.created_at = ?
        OR employments.employee_id = ? AND employments.created_at = ?
        OR employments.employee_id = ? AND employments.created_at = ?
        -- ...
      ) AND employments.status = 'inactive'
    )
    

    The above method doesn't hold up well for large amounts of records, since the query grows for each additional employee. It becomes a lot easier when we can assume the higher id is made last. In that scenario the following would do the trick:

    employment_ids = Employment.select(Employment.arel_table[:id].maxiumum).group(:employee_id)
    employee_ids = Employment.select(:employee_id).where(id: employment_ids, status: :inactive)
    employees = Employee.where(id: employee_ids)
    

    This should produce a single query when employees is loaded.

    SELECT employees.*
    FROM employees
    WHERE employees.id IN (
      SELECT employments.employee_id 
      FROM employments
      WHERE employments.id IN (
        SELECT MAX(employments.id)
        FROM employments
        GROUP BY employments.employee_id
      ) AND employments.status = 'inactive'
    )
    

    This solution works a lot better with larger datasets but you might want to look into the answer of max for better lookup performance.

提交回复
热议问题