SQL Server - conditional aggregation with correlation

前端 未结 2 1769
广开言路
广开言路 2021-02-07 07:02

Background:

The original case was very simple. Calculate running total per user from highest revenue to lowest:

CREATE TABLE t(Customer INTEGER  NOT NULL         


        
相关标签:
2条回答
  • 2021-02-07 07:43

    There is an easier solution:

    SELECT c.Customer, c."User", c."Revenue",
           1.0 * Revenue/ NULLIF(c2.sum_total, 0) AS percentage,
           1.0 * c2.sum_running / NULLIF(c2.sum_total, 0) AS running_percentage
    FROM t c CROSS APPLY
         (SELECT SUM(c2.Revenue) AS sum_total,
                 SUM(CASE WHEN c2.Revenue >= x.Revenue THEN c2.Revenue ELSE 0 END) 
                     as sum_running
          FROM t c2 CROSS JOIN
               (SELECT c.REVENUE) x
          WHERE c."User" = c2."User"
         ) c2
    ORDER BY "User", Revenue DESC;
    

    I am not sure why or if this limitation is in the SQL '92 standard. I did have it pretty well memorized 20 or so years ago, but I don't recall that particular limitation.

    I should note:

    • At the time of the SQL 92 standard, lateral joins were not really on the radar. Sybase definitely had no such concept.
    • Other databases do have problems with outer references. In particular, they often limit the scoping to one level deep.
    • The SQL Standard itself tends to highly political (that is, vendor-driven) rather than driven by actual database user requirements. Well, over time, it does move in the right direction.
    0 讨论(0)
  • 2021-02-07 08:03

    There is no such limitation in the SQL standard for LATERAL. CROSS APPLY is a vendor-specific extension from Microsoft (Oracle adopted it later for compatibility) and its limitations are obviously not owed to the ISO/IEC SQL standard, since the MS feature pre-dates the standard.

    LATERAL in standard SQL is basically just a modifier for joins to allow lateral references in the join tree. There is no limit to the number of columns that can be referenced.

    I wouldn't see a reason for the odd restriction to begin with. Maybe it's because CROSS APPLY was originally intended to allow table-valued functions, which was later extended to allow sub-SELECTs.

    The Postgres manual explains LATERAL like this:

    The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROM item.)

    The Postgres version of your query (without the more elegant window functions) can be simpler:

    SELECT c.*
         , round(revenue        / c2.sum_total, 2) END AS percentage 
         , round(c2.running_sum / c2.sum_total, 2) END AS running_percentage 
    FROM   t c, LATERAL (
       SELECT NULLIF(SUM(revenue), 0)::numeric AS sum_total  -- NULLIF, cast once
            , SUM(revenue) FILTER (WHERE revenue >= c.revenue) AS running_sum
       FROM   t
       WHERE  "User" = c."User"
       ) c2
    ORDER  BY c."User", c.revenue DESC;
    
    • Postgres 9.4+ has the more elegant aggregate FILTER for conditional aggregates.

    • NULLIF makes sense, I only suggest a minor simplification.

    • Cast sum_total to numeric once.

    • Round result to match your desired result.

    0 讨论(0)
提交回复
热议问题