Cumulative distinct count

前端 未结 4 1620
北海茫月
北海茫月 2021-02-06 14:37

I am working on query to get cumulative distinct count of uids on daily basis.

Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appea

相关标签:
4条回答
  • 2021-02-06 15:12

    Please try the following...

    SELECT date AS date
           COUNT( uid ) AS daily_cumulative_count
    FROM ( SELECT leftTable.date AS date,
                  rightTable.uid AS uid
           FROM sample_table AS leftTable
           JOIN sample_table AS rightTable ON leftTable.date >= rightTable.date
           GROUP BY leftTable.date,
                    rightTable.uid
         ) AS allUIDSForDateFinder
    GROUP BY date;
    

    This statement starts by joining one instance of sample_table to another in such a way that each record in leftTable has associated with it a copy of each record from rightTable that has an earlier or equal date value. This effectively attaches a list to each date of all uid values that have occurred up to and including that date value.

    The resulting dataset is refined to unique date and uid combinations through use of GROUP BY.

    The refined dataset from the subquery allUIDSForDateFinder is then grouped by date by the main body of the query, and a COUNT() of uid values associated with each group is performed.

    If you have any questions or comments, then please feel free to post a Comment accordingly.

    0 讨论(0)
  • 2021-02-06 15:15
    WITH firstseen AS (
      SELECT uid, MIN(date) date
      FROM sample_table
      GROUP BY 1
    )
    SELECT DISTINCT date, COUNT(uid) OVER (ORDER BY date) daily_cumulative_count 
    FROM firstseen
    ORDER BY 1
    

    Using SELECT DISTINCT because (date, COUNT(uid)) will be duplicated many times.

    Explanation: for each date dt, it counts uid from the earliest date up to dt, because we are specifying ORDER BY date and it defaults to BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

    0 讨论(0)
  • 2021-02-06 15:18

    easiest way:

    SELECT *, count(*) over (order by fst_date ) cum_uids
      FROM (
    SELECT uid, min(date) fst_date FROM t GROUP BY uid
     ) t
    

    or something like this

    0 讨论(0)
  • 2021-02-06 15:24

    You can use exists to check if an id was present on any of the previous dates. Then get the running sum and find the max value for each group which would get you the daily distinct cumulative count.

    select dt, max(col) as daily_cumulative_count
    from (select t1.*, 
          sum(case when not exists (select 1 from t where t1.dt > dt and id = t1.uid) then 1 else 0 end) over(order by dt) col
          from t t1) x 
    group by dt
    
    0 讨论(0)
提交回复
热议问题