Calculate running total / running balance

后端 未结 6 1715
粉色の甜心
粉色の甜心 2020-11-22 10:01

I have a table:

create table Transactions(Tid int,amt int)

With 5 rows:

insert into Transactions values(1, 100)
insert into         


        
相关标签:
6条回答
  • 2020-11-22 10:33

    We're on 2008R2 and I use variables and a temp table. This also allows you to do custom things when calculating each row using a case statement (i.e. certain transactions may act differently, or you may only want a total for specific transaction types)

    DECLARE @RunningBalance int = 0
    SELECT Tid, Amt, 0 AS RunningBalance
    INTO #TxnTable
    FROM Transactions
    ORDER BY Tid
    
    UPDATE #TxnTable
    SET @RunningBalance = RunningBalance = @RunningBalance + Amt
    
    SELECT * FROM #TxnTable
    DROP TABLE #TxnTable
    

    We have a transaction table with 2.3 million rows with an item that has over 3,300 transactions, and running this type of query against that takes no time at all.

    0 讨论(0)
  • 2020-11-22 10:36

    With the 2012 SUM and OVER functions you can now nest sum and counts.

    SELECT date, sum(count(DISTINCT unique_id)) OVER (ORDER BY date) AS total_per_date
    FROM dbo.table
    GROUP BY date
    
    0 讨论(0)
  • 2020-11-22 10:37
    select v.ID
    ,CONVERT(VARCHAR(10), v.EntryDate, 103) + ' '  + convert(VARCHAR(8), v.EntryDate, 14) 
    as EntryDate
    ,case
    when v.CreditAmount<0
    then
        ISNULL(v.CreditAmount,0) 
        else 
        0 
    End  as credit
    ,case
    when v.CreditAmount>0
    then
        v.CreditAmount
        else
        0
    End  as debit
    ,Balance = SUM(v.CreditAmount) OVER (ORDER BY v.ID ROWS UNBOUNDED PRECEDING)
          from VendorCredit v
        order by v.EntryDate desc
    
    0 讨论(0)
  • 2020-11-22 10:40

    If you use version 2012, here is a solution

    select *, sum(amt) over (order by Tid) as running_total from Transactions 
    

    For earlier versions

    select *,(select sum(amt) from Transactions where Tid<=t.Tid) as running_total from Transactions as t
    
    0 讨论(0)
  • 2020-11-22 10:40

    In SQL Server 2008+

    SELECT  T1.* ,
            T2.RunningSum
    FROM    dbo.Transactions As T1
            CROSS APPLY ( SELECT    SUM(amt) AS RunningSum
                          FROM      dbo.Transactions AS CAT1
                          WHERE     ( CAT1.TId <= T1.TId )
                        ) AS T2
    

    In SQL server 2012+

    SELECT  * ,
            SUM(T1.amt) OVER ( ORDER BY T1.TId 
                            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS RunningTotal
    FROM    dbo.Transactions AS t1
    
    0 讨论(0)
  • 2020-11-22 10:41

    For those not using SQL Server 2012 or above, a cursor is likely the most efficient supported and guaranteed method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.


    INNER JOIN - do not do this:

    The slow, set-based approach is of the form:

    SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt)
    FROM dbo.Transactions AS t1
    INNER JOIN dbo.Transactions AS t2
      ON t1.TID >= t2.TID
    GROUP BY t1.TID, t1.amt
    ORDER BY t1.TID;
    

    The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.


    Correlated subquery - do not do this either:

    The subquery form is similarly painful for similarly painful reasons.

    SELECT TID, amt, RunningTotal = amt + COALESCE(
    (
      SELECT SUM(amt)
        FROM dbo.Transactions AS i
        WHERE i.TID < o.TID), 0
    )
    FROM dbo.Transactions AS o
    ORDER BY TID;
    

    Quirky update - do this at your own risk:

    The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it. The primary reason I even answered this question instead of closing it as a duplicate is because the other question has a quirky update as the accepted answer.

    DECLARE @t TABLE
    (
      TID INT PRIMARY KEY,
      amt INT,
      RunningTotal INT
    );
     
    DECLARE @RunningTotal INT = 0;
     
    INSERT @t(TID, amt, RunningTotal)
      SELECT TID, amt, RunningTotal = 0
      FROM dbo.Transactions
      ORDER BY TID;
     
    UPDATE @t
      SET @RunningTotal = RunningTotal = @RunningTotal + amt
      FROM @t;
     
    SELECT TID, amt, RunningTotal
      FROM @t
      ORDER BY TID;
    

    Recursive CTEs

    This first one relies on TID to be contiguous, no gaps:

    ;WITH x AS
    (
      SELECT TID, amt, RunningTotal = amt
        FROM dbo.Transactions
        WHERE TID = 1
      UNION ALL
      SELECT y.TID, y.amt, x.RunningTotal + y.amt
       FROM x 
       INNER JOIN dbo.Transactions AS y
       ON y.TID = x.TID + 1
    )
    SELECT TID, amt, RunningTotal
      FROM x
      ORDER BY TID
      OPTION (MAXRECURSION 10000);
    

    If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using ROW_NUMBER():

    ;WITH y AS 
    (
      SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID)
        FROM dbo.Transactions
    ), x AS
    (
        SELECT TID, rn, amt, rt = amt
          FROM y
          WHERE rn = 1
        UNION ALL
        SELECT y.TID, y.rn, y.amt, x.rt + y.amt
          FROM x INNER JOIN y
          ON y.rn = x.rn + 1
    )
    SELECT TID, amt, RunningTotal = rt
      FROM x
      ORDER BY x.rn
      OPTION (MAXRECURSION 10000);
    

    Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:

    CREATE TABLE #x
    (
      rn  INT PRIMARY KEY,
      TID INT,
      amt INT
    );
    
    INSERT INTO #x (rn, TID, amt)
    SELECT ROW_NUMBER() OVER (ORDER BY TID),
      TID, amt
    FROM dbo.Transactions;
    
    ;WITH x AS
    (
      SELECT TID, rn, amt, rt = amt
        FROM #x
        WHERE rn = 1
      UNION ALL
      SELECT y.TID, y.rn, y.amt, x.rt + y.amt
        FROM x INNER JOIN #x AS y
        ON y.rn = x.rn + 1
    )
    SELECT TID, amt, RunningTotal = rt
      FROM x
      ORDER BY TID
      OPTION (MAXRECURSION 10000);
    
    DROP TABLE #x;
    

    Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).


    Cursor

    Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):

    CREATE TABLE #x
    (
      TID INT PRIMARY KEY, 
      amt INT, 
      rt INT
    );
    
    INSERT #x(TID, amt) 
      SELECT TID, amt
      FROM dbo.Transactions
      ORDER BY TID;
    
    DECLARE @rt INT, @tid INT, @amt INT;
    SET @rt = 0;
    
    DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
      FOR SELECT TID, amt FROM #x ORDER BY TID;
    
    OPEN c;
    
    FETCH c INTO @tid, @amt;
    
    WHILE @@FETCH_STATUS = 0
    BEGIN
      SET @rt = @rt + @amt;
      UPDATE #x SET rt = @rt WHERE TID = @tid;
      FETCH c INTO @tid, @amt;
    END
    
    CLOSE c; DEALLOCATE c;
    
    SELECT TID, amt, RunningTotal = rt 
      FROM #x 
      ORDER BY TID;
    
    DROP TABLE #x;
    

    SQL Server 2012 or above

    New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):

    SELECT TID, amt, 
      RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING)
    FROM dbo.Transactions
    ORDER BY TID;
    

    Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.

    SELECT TID, amt, 
      RunningTotal = SUM(amt) OVER (ORDER BY TID)
    FROM dbo.Transactions
    ORDER BY TID;
    
    SELECT TID, amt, 
      RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING)
    FROM dbo.Transactions
    ORDER BY TID;
    

    CLR

    For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).

    http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/


    Conclusion

    If you are on SQL Server 2012 or above, the choice is obvious - use the new SUM() OVER() construct (with ROWS vs. RANGE). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:

    1. SUM() OVER() ... ROWS, if on 2012 or above
    2. CLR method, if possible
    3. First recursive CTE method, if possible
    4. Cursor
    5. The other recursive CTE methods
    6. Quirky update
    7. Join and/or correlated subquery

    For further information with performance comparisons of these methods, see this question on http://dba.stackexchange.com:

    https://dba.stackexchange.com/questions/19507/running-total-with-count


    I've also blogged more details about these comparisons here:

    http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals


    Also for grouped/partitioned running totals, see the following posts:

    http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals

    Partitioning results in a running totals query

    Multiple Running Totals with Group By

    0 讨论(0)
提交回复
热议问题