How to group timestamps into islands (based on arbitrary gap)?

后端 未结 2 1986
我在风中等你
我在风中等你 2021-01-22 16:08

Consider this list of dates as timestamptz:

I grouped the dates by hand using colors: every group is separated from the next by a gap of at least 2

相关标签:
2条回答
  • 2021-01-22 16:28

    Building up on Erwin's answer, here is the full query for tallying up the amount of time people spent on those sessions/islands:

    My data only shows when people finished reviewing something, not when they started, which means we don't know when a session truly started; and some islands only have one timestamp in them (leading to a 0-duration estimate.) I'm accounting for both by calculating the average review time and adding it to the total duration of islands.

    This is likely very idiosyncratic to my use case, but I learned a thing or two in the process, so maybe this will help someone down the line.

    -- Returns estimated total study time and average time per review, both in seconds
    SELECT (EXTRACT( EPOCH FROM logged) + countofislands * avgreviewtime) as totalstudytime, avgreviewtime -- add total logged time to estimate for first-review-in-island and 1-review islands
    FROM
        (
        SELECT -- get the three key values that will let us calculate total time spent
          sum(duration) as logged
          , count(island) as countofislands
          , EXTRACT( EPOCH FROM sum(duration) FILTER (WHERE duration != '00:00:00'::interval) )/( sum(reviews) FILTER (WHERE duration != '00:00:00'::interval) - count(reviews) FILTER (WHERE duration != '00:00:00'::interval))  as avgreviewtime
        FROM
            (
            SELECT island, age( max(done), min(done) ) as duration, count(island) as reviews -- calculate the duration of islands
            FROM
                (
                SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS island -- give a unique number to each island
                FROM (
                    SELECT -- detect the beginning of islands
                        done,
                        (
                            lag(done) OVER (ORDER BY done) <= done - interval '2 min'
                        ) AS step
                    FROM review
                    WHERE clicker_id = 71 AND "done" > '2015-05-13' AND "done" < '2015-05-13 15:00:00' -- keep the queries small and fast for now
                   ) sub
                ORDER BY done
                ) grouped
            GROUP BY island
            ) sessions
        ) summary
    
    0 讨论(0)
  • 2021-01-22 16:29

    This would do it:

    SELECT done, count(*) FILTER (WHERE step) OVER (ORDER BY done) AS grp
    FROM  (
       SELECT done
           , (lag(done) OVER (ORDER BY done) <= done - interval '2 min') AS step
       FROM   tbl
       ) sub
    ORDER  BY done;
    

    The subquery sub records step as true if the previous row is at least 2 min away - sorted by the timestamp column done itself in this case.

    The outer query adds a rolling count of steps, effectively the group number (grp) - combining the aggregate FILTER clause with another window function.

    db<>fiddle here

    Related:

    • Query to find all timestamps more than a certain interval apart
    • How to label groups in postgresql when group belonging depends on the preceding line?
    • Select longest continuous sequence
    • Grouping or Window

    About the aggregate FILTER clause:

    • How can I simplify this game statistics query?
    • Conditional lead/lag function PostgreSQL?
    0 讨论(0)
提交回复
热议问题