Hive query generating identifiers for a sequence of row matching a condition

前端 未结 3 1105
执笔经年
执笔经年 2021-01-23 11:54

Let\'s say I have the following hive table as input, let\'s call it connections:

userid  | timestamp   
--------|-------------
1       | 1433258019          


        
3条回答
  •  离开以前
    2021-01-23 12:18

    This works:

    SELECT 
      userid,
      timestamp,
      timediff,
      CONCAT(
        'user',
         userid,
         '-',
         'session-',
         CAST(timediff / 60 AS INT) + 1
      ) AS session_id
      FROM (
        SELECT   
          userid,
          timestamp,
          timestamp - LAG(timestamp, 1, timestamp) OVER w AS timediff
        FROM connections
        WINDOW w AS (
          PARTITION BY userid
          ORDER BY timestamp ASC
        )
    ) a;
    

    OUTPUT:

    userid  timestamp   timediff    session_state
    1       1433258019  0.0         user1-session-1
    1       1433258020  1.0         user1-session-1
    2       1433258080  0.0         user2-session-1
    2       1433258083  3.0         user2-session-1
    2       1433258088  5.0         user2-session-1
    2       1433258170  82.0        user2-session-2
    3       1433258270  0.0         user3-session-1
    

    you can try something like this if timediff is not required:

    select userid,timestamp ,session_count+ concat('user',userid,'-','session-',cast(LAG(session_count-1,1,0) over w1 as string)) AS session_state
    --LAG(session_count-1,1,0) over w1 AS session_count_new FROM (select userid, timestamp, timediff, cast (timediff/60 as int)+1 as session_count

提交回复
热议问题