Partition Function COUNT() OVER possible using DISTINCT

后端 未结 4 1414
臣服心动
臣服心动 2020-11-27 10:26

I\'m trying to write the following in order to get a running total of distinct NumUsers, like so:

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION         


        
相关标签:
4条回答
  • 2020-11-27 10:42

    There is a very simple solution using dense_rank()

    dense_rank() over (partition by [Mth] order by [UserAccountKey]) 
    + dense_rank() over (partition by [Mth] order by [UserAccountKey] desc) 
    - 1
    

    This will give you exactly what you were asking for: The number of distinct UserAccountKeys within each month.

    0 讨论(0)
  • 2020-11-27 10:47

    Necromancing:

    It's relativiely simple to emulate a COUNT DISTINCT over PARTITION BY with MAX via DENSE_RANK:

    ;WITH baseTable AS
    (
        SELECT 'RM1' AS RM, 'ADR1' AS ADR
        UNION ALL SELECT 'RM1' AS RM, 'ADR1' AS ADR
        UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
        UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
        UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
        UNION ALL SELECT 'RM2' AS RM, 'ADR3' AS ADR
        UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
        UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
        UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
        UNION ALL SELECT 'RM3' AS RM, 'ADR2' AS ADR
    )
    ,CTE AS
    (
        SELECT RM, ADR, DENSE_RANK() OVER(PARTITION BY RM ORDER BY ADR) AS dr 
        FROM baseTable
    )
    SELECT
         RM
        ,ADR
    
        ,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY ADR) AS cnt1 
        ,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM) AS cnt2 
        -- Not supported
        --,COUNT(DISTINCT CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY CTE.ADR) AS cntDist
        ,MAX(CTE.dr) OVER (PARTITION BY CTE.RM ORDER BY CTE.RM) AS cntDistEmu 
    FROM CTE
    

    Note:
    This assumes the fields in question are NON-nullable fields.
    If there is one or more NULL-entries in the fields, you need to subtract 1.

    0 讨论(0)
  • 2020-11-27 10:48

    I think the only way of doing this in SQL-Server 2008R2 is to use a correlated subquery, or an outer apply:

    SELECT  datekey,
            COALESCE(RunningTotal, 0) AS RunningTotal,
            COALESCE(RunningCount, 0) AS RunningCount,
            COALESCE(RunningDistinctCount, 0) AS RunningDistinctCount
    FROM    document
            OUTER APPLY
            (   SELECT  SUM(Amount) AS RunningTotal,
                        COUNT(1) AS RunningCount,
                        COUNT(DISTINCT d2.dateKey) AS RunningDistinctCount
                FROM    Document d2
                WHERE   d2.DateKey <= document.DateKey
            ) rt;
    

    This can be done in SQL-Server 2012 using the syntax you have suggested:

    SELECT  datekey,
            SUM(Amount) OVER(ORDER BY DateKey) AS RunningTotal
    FROM    document
    

    However, use of DISTINCT is still not allowed, so if DISTINCT is required and/or if upgrading isn't an option then I think OUTER APPLY is your best option

    0 讨论(0)
  • 2020-11-27 11:03

    I use a solution that is similar to that of David above, but with an additional twist if some rows should be excluded from the count. This assumes that [UserAccountKey] is never null.

    -- subtract an extra 1 if null was ranked within the partition,
    -- which only happens if there were rows where [Include] <> 'Y'
    dense_rank() over (
      partition by [Mth] 
      order by case when [Include] = 'Y' then [UserAccountKey] else null end asc
    ) 
    + dense_rank() over (
      partition by [Mth] 
      order by case when [Include] = 'Y' then [UserAccountKey] else null end desc
    )
    - max(case when [Include] = 'Y' then 0 else 1 end) over (partition by [Mth])
    - 1
    

    An SQL Fiddle with an extended example can be found here.

    0 讨论(0)
提交回复
热议问题