SQL join: selecting the last records in a one-to-many relationship

前端 未结 10 1666
深忆病人
深忆病人 2020-11-22 08:48

Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in

相关标签:
10条回答
  • 2020-11-22 09:21

    Another approach would be to use a NOT EXISTS condition in your join condition to test for later purchases:

    SELECT *
    FROM customer c
    LEFT JOIN purchase p ON (
           c.id = p.customer_id
       AND NOT EXISTS (
         SELECT 1 FROM purchase p1
         WHERE p1.customer_id = c.id
         AND p1.id > p.id
       )
    )
    
    0 讨论(0)
  • 2020-11-22 09:24

    You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).

    Syntax in SQL Server would be:

    SELECT c.*, p.*
    FROM customer c INNER JOIN 
         (SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
                 FROM purchase) p
    ON (c.id = p.customer_id)
    WHERE p.r = 1
    
    0 讨论(0)
  • 2020-11-22 09:27

    I found this thread as a solution to my problem.

    But when I tried them the performance was low. Bellow is my suggestion for better performance.

    With MaxDates as (
    SELECT  customer_id,
                    MAX(date) MaxDate
            FROM    purchase
            GROUP BY customer_id
    )
    
    SELECT  c.*, M.*
    FROM    customer c INNER JOIN
            MaxDates as M ON c.id = M.customer_id 
    

    Hope this will be helpful.

    0 讨论(0)
  • 2020-11-22 09:27

    Tested on SQLite:

    SELECT c.*, p.*, max(p.date)
    FROM customer c
    LEFT OUTER JOIN purchase p
    ON c.id = p.customer_id
    GROUP BY c.id
    

    The max() aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id).

    In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).

    The LEFT OUTER JOIN (as opposed to INNER JOIN) will make sure that customers that have never made a purchase are also included.

    0 讨论(0)
  • 2020-11-22 09:32

    You could also try doing this using a sub select

    SELECT  c.*, p.*
    FROM    customer c INNER JOIN
            (
                SELECT  customer_id,
                        MAX(date) MaxDate
                FROM    purchase
                GROUP BY customer_id
            ) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
            purchase p ON   MaxDates.customer_id = p.customer_id
                        AND MaxDates.MaxDate = p.date
    

    The select should join on all customers and their Last purchase date.

    0 讨论(0)
  • 2020-11-22 09:33

    If you're using PostgreSQL you can use DISTINCT ON to find the first row in a group.

    SELECT customer.*, purchase.*
    FROM customer
    JOIN (
       SELECT DISTINCT ON (customer_id) *
       FROM purchase
       ORDER BY customer_id, date DESC
    ) purchase ON purchase.customer_id = customer.id
    

    PostgreSQL Docs - Distinct On

    Note that the DISTINCT ON field(s) -- here customer_id -- must match the left most field(s) in the ORDER BY clause.

    Caveat: This is a nonstandard clause.

    0 讨论(0)
提交回复
热议问题