Can window function LAG reference the column which value is being calculated?

前端 未结 3 728
北海茫月
北海茫月 2021-01-18 01:30

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Bas

相关标签:
3条回答
  • 2021-01-18 01:54

    This feels more like a recursive problem than windowing function. The following query obtained the desired results:

    WITH RECURSIVE base(type, time_stamp) AS (
    
      -- 3. base of recursive query
      SELECT x.type, x.time_stamp, y.next_time_stamp
        FROM 
             -- 1. start with the initial records of each type   
             ( SELECT type, min(time_stamp) AS time_stamp
                 FROM event
                 GROUP BY type
             ) x
             LEFT JOIN LATERAL
             -- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
             ( SELECT MIN(time_stamp) next_time_stamp
                 FROM event
                 WHERE type = x.type
                   AND time_stamp > (x.time_stamp + 10)
             ) y ON true
    
      UNION ALL
    
      -- 4. recursive join, same logic as base
      SELECT e.type, e.time_stamp, z.next_time_stamp
        FROM event e
        JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
        LEFT JOIN LATERAL
        ( SELECT MIN(time_stamp) next_time_stamp
           FROM event
           WHERE type = e.type
             AND time_stamp > (e.time_stamp + 10)
        ) z ON true
    
    )
    
    -- The actual query:
    
    -- 5a. All records from base are not duplicates
    SELECT time_stamp, type, false
      FROM base
    
    UNION
    
    -- 5b. All records from event that are not in base are duplicates
    SELECT time_stamp, type, true
      FROM event
      WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base) 
    
    ORDER BY type, time_stamp
    

    There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.

    This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

    0 讨论(0)
  • 2021-01-18 02:04

    An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.

    State transition function:

    create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
    returns int[] language plpgsql as $$
    begin
        if st is null or st[1] + timeframe <= time_stamp
        then 
            st[1] := time_stamp;
        end if;
        st[2] := time_stamp;
        return st;
    end $$;
    

    Final function:

    create or replace function is_duplicate_final(st int[])
    returns boolean language sql as $$
        select st[1] <> st[2];
    $$;
    

    Aggregate:

    create aggregate is_duplicate_agg(time_stamp int, timeframe int)
    (
        sfunc = is_duplicate,
        stype = int[],
        finalfunc = is_duplicate_final
    );
    

    Query:

    select *, is_duplicate_agg(time_stamp, 10) over w
    from event
    window w as (partition by type order by time_stamp asc)
    order by type, time_stamp;
    
     id | type | time_stamp | is_duplicate_agg 
    ----+------+------------+------------------
      1 |    1 |          1 | f
      2 |    1 |          2 | t
      4 |    1 |          3 | t
      5 |    1 |         10 | t
      7 |    1 |         15 | f
      8 |    1 |         21 | t
     10 |    1 |         40 | f
      3 |    2 |          2 | f
      6 |    2 |         10 | t
      9 |    2 |         13 | f
    (10 rows)   
    

    Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.

    0 讨论(0)
  • 2021-01-18 02:15

    Naive recursive chain knitter:


            -- temp view to avoid nested CTE
    CREATE TEMP VIEW drag AS
            SELECT e.type,e.time_stamp
            , ROW_NUMBER() OVER www as rn                   -- number the records
            , FIRST_VALUE(e.time_stamp) OVER www as fst     -- the "group leader"
            , EXISTS (SELECT * FROM event x
                    WHERE x.type = e.type
                    AND x.time_stamp < e.time_stamp) AS is_dup
            FROM event e
            WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
            ;
    
    WITH RECURSIVE ttt AS (
            SELECT d0.*
            FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
        UNION ALL
            SELECT d1.type, d1.time_stamp, d1.rn
              , CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
                     ELSE ttt.fst END AS fst   -- new "group leader"
              , CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
                     ELSE True END AS is_dup
            FROM drag d1
            JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
            )
    SELECT * FROM ttt
    ORDER BY type, time_stamp
            ;
    

    Results:


    CREATE TABLE
    INSERT 0 10
    CREATE VIEW
     type | time_stamp | rn | fst | is_dup 
    ------+------------+----+-----+--------
        1 |          1 |  1 |   1 | f
        1 |          2 |  2 |   1 | t
        1 |          3 |  3 |   1 | t
        1 |         10 |  4 |   1 | t
        1 |         15 |  5 |   1 | t
        1 |         21 |  6 |   1 | t
        1 |         40 |  7 |  40 | f
        2 |          2 |  1 |   2 | f
        2 |         10 |  2 |   2 | t
        2 |         13 |  3 |   2 | t
    (10 rows)
    
    0 讨论(0)
提交回复
热议问题