Let\'s say I have the following hive table as input, let\'s call it connections
:
userid | timestamp
--------|-------------
1 | 1433258019
Interesting question. Per your comment to @Madhu, I added the line 2 1433258172
to your example. What you need is to increment every time timediff > 60
is satisfied. The easiest way to do this is to flag it and then cumulatively sum over the window.
Query:
select userid
, timestamp
, concat('user', userid, '-session-', s_sum) sessionid
from (
select *
, sum( counter ) over (partition by userid
order by timestamp asc
rows between unbounded preceding and current row) s_sum
from (
select *
, case when timediff > 60 then 1 else 0 end as counter
from (
select userid
, timestamp
, timestamp - lag(timestamp, 1, 0) over (partition by userid
order by timestamp asc) timediff
from connections ) x ) y ) z
Output:
1 1433258019 user1-session-1
1 1433258020 user1-session-1
2 1433258080 user2-session-1
2 1433258083 user2-session-1
2 1433258088 user2-session-1
2 1433258170 user2-session-2
2 1433258172 user2-session-2
Use the following select concat_ws('-',name, city) from employee; the first parameter of concat_ws is separator. name and city are column names for employee table. See that they are of type strings. You can look here for more
This works:
SELECT
userid,
timestamp,
timediff,
CONCAT(
'user',
userid,
'-',
'session-',
CAST(timediff / 60 AS INT) + 1
) AS session_id
FROM (
SELECT
userid,
timestamp,
timestamp - LAG(timestamp, 1, timestamp) OVER w AS timediff
FROM connections
WINDOW w AS (
PARTITION BY userid
ORDER BY timestamp ASC
)
) a;
OUTPUT:
userid timestamp timediff session_state
1 1433258019 0.0 user1-session-1
1 1433258020 1.0 user1-session-1
2 1433258080 0.0 user2-session-1
2 1433258083 3.0 user2-session-1
2 1433258088 5.0 user2-session-1
2 1433258170 82.0 user2-session-2
3 1433258270 0.0 user3-session-1
you can try something like this if timediff is not required:
select userid,timestamp ,session_count+ concat('user',userid,'-','session-',cast(LAG(session_count-1,1,0) over w1 as string)) AS session_state
--LAG(session_count-1,1,0) over w1 AS session_count_new
FROM
(select
userid,
timestamp,
timediff,
cast (timediff/60 as int)+1 as session_count