Conditional lead/lag function PostgreSQL?

后端 未结 2 1535
一向
一向 2020-12-10 17:41

I have a table like this:

Name   activity  time

user1  A1        12:00
user1  E3        12:01
user1  A2        12:02
user2  A1        10:05
user2  A2                


        
相关标签:
2条回答
  • 2020-12-10 18:17

    Test setup:

    CREATE TEMP TABLE t (name text, activity text, time time);
    INSERT INTO t values
     ('user1', 'A1', '12:00')
    ,('user1', 'E3', '12:01')
    ,('user1', 'A2', '12:02')
    ,('user2', 'A1', '10:05')
    ,('user2', 'A2', '10:06')
    ,('user2', 'A3', '10:07')
    ,('user2', 'M6', '10:07')
    ,('user2', 'B1', '10:08')
    ,('user3', 'A1', '14:15')
    ,('user3', 'B2', '14:20')
    ,('user3', 'D1', '14:25')
    ,('user3', 'D2', '14:30');
    

    Your definition:

    activity from group B always takes place after activity from group A.

    .. logically implies that there is, per user, 0 or 1 B activity after 1 or more A activities. Never more than 1 B activities in sequence.

    You can make it work with a single window function, DISTINCT ON and CASE, which should be the fastest way for few rows per user (also see below):

    SELECT name
         , CASE WHEN a2 LIKE 'B%' THEN a1 ELSE a2 END AS activity
         , CASE WHEN a2 LIKE 'B%' THEN a2 END AS next_activity
    FROM  (
       SELECT DISTINCT ON (name)
              name
            , lead(activity) OVER (PARTITION BY name ORDER BY time DESC) AS a1
            , activity AS a2
       FROM   t
       WHERE (activity LIKE 'A%' OR activity LIKE 'B%')
       ORDER  BY name, time DESC
       ) sub;
    

    An SQL CASE expression defaults to NULL if no ELSE branch is added, so I kept that short.

    Also assuming time is defined NOT NULL. Else, you might want to add NULLS LAST. Why?

    • Select first row in each GROUP BY group?

    (activity LIKE 'A%' OR activity LIKE 'B%') is more verbose than activity ~ '^[AB]', but typically faster in older versions of Postgres. About pattern matching:

    • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

    Conditional window functions?

    That's actually possible. You can combine the aggregate FILTER clause with the OVER clause of window functions. However:

    1. The FILTER clause itself can only work with values from the current row.

    2. More importantly, FILTER is not implemented for pure window functions like lead() or lag() in Postgres 9.6 (yet) - only for aggregate functions.

    If you try:

    lead(activity) FILTER (WHERE activity LIKE 'A%') OVER () AS activity
    

    Postgres will tell you:

    FILTER is not implemented for non-aggregate window functions
    

    About FILTER:

    • How can I simplify this game statistics query?
    • Referencing current row in FILTER clause of window function

    Performance

    (For few users with few rows per user, pretty much any query is fast, even without index.)

    For many users and few rows per user, the first query above should be fastest. See the linked answer above about index and performance.

    For many rows per user, there are (potentially much) faster techniques, depending on other details of your setup:

    • Optimize GROUP BY query to retrieve latest record per user
    0 讨论(0)
  • 2020-12-10 18:27
    select      distinct on(name) name,activity,next_activity
    
    from       (select name,activity,time
                      ,lead(activity) over (partition by name order by time) as next_activity
    
                from   t
    
                where  left(activity,1) in ('A','B')
                ) t
    
    where       left(activity,1) = 'A'
    
    order by    name,time desc
    
    0 讨论(0)
提交回复
热议问题