Optimize GROUP BY query to retrieve latest row per user

后端 未结 3 1878
南笙
南笙 2020-11-21 11:12

I have the following log table for user messages (simplified form) in Postgres 9.2:

CREATE TABLE log (
    log_date DATE,
    user_id  INTEGER,
    payload           


        
3条回答
  •  一个人的身影
    2020-11-21 11:41

    This is not a standalone answer but rather a comment to @Erwin's answer. For 2a, the lateral join example, the query can be improved by sorting the users table to exploit the locality of the index on log.

    SELECT u.user_id, l.log_date, l.payload
      FROM (SELECT user_id FROM users ORDER BY user_id) u,
           LATERAL (SELECT log_date, payload
                      FROM log
                     WHERE user_id = u.user_id -- lateral reference
                       AND log_date <= :mydate
                  ORDER BY log_date DESC NULLS LAST
                     LIMIT 1) l;
    

    The rationale is that index lookup is expensive if user_id values are random. By sorting out user_id first, the subsequent lateral join would be like a simple scan on the index of log. Even though both query plans look alike, the running time would differ much especially for large tables.

    The cost of the sorting is minimal especially if there is an index on the user_id field.

提交回复
热议问题