Iterate over rows using SQL

蹲街弑〆低调 提交于 2021-01-28 06:13:26

问题


I have a table in a Redshift-database containing event-data. Each row is one event. Every event have eventid, but not sessionid that I now need. I have extracted a sample of the table (a subset of columns and only events from one userid):

time        userid          eventid     sessionstart    sessiontop
1498639773  101xnmnd1ohi62  504747459   t               f
1498639777  101xnmnd1ohi62  1479311450  f               f
1498639803  101xnmnd1ohi62  808610184   f               f
1498639816  101xnmnd1ohi62  335000637   f               f
1498639903  101xnmnd1ohi62  238269920   f               f
1498639906  101xnmnd1ohi62  990687838   f               f
1498639952  101xnmnd1ohi62  781472797   f               t
1498650109  101xnmnd1ohi62  1826568537  t               f
1498650124  101xnmnd1ohi62  2079795673  f               f
1498650365  101xnmnd1ohi62  578922176   f               t

This is ordered by userid and time, so that the events are displayed in correct order, according to session activity. Every event has a boolean value for sessionstart and sessionstop. By looking at the list of events I can identify the sessions by finding all events within (and including) sessionstart=true and sessionstop=true. In the events listed here, there are two sessions. First session starts with eventid 504747459 and ends with 781472797. Second session starts with eventid 1826568537 and ends with 578922176. What I want to do is mark these two sessions (and all other sessions) with a sessionid, using SQL. I haven't found any way to do this using SQL. It will be possible using eg. Python, but I believe the performance will be very poor. Therefore SQL is preferred.

Does anyone have a tip to how I can solve this?


回答1:


I think it might be easier just to use sessionstart -- assuming that there are no events in-between as session start and session end.

If so:

select e.*
       sum(case when sessionstart then 1 else 0 end) over (partition by userid order by time) as user_sessionid
from events e;

This provides a sessionid "within" each user. If users always start with a new session (a reasonable assumption), then this is easily extended to a global session id:

select e.*
       sum(case when sessionstart then 1 else 0 end) over (order by userid, time) as user_sessionid
from events e;


来源:https://stackoverflow.com/questions/45085461/iterate-over-rows-using-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!