I am trying to perform cumulative multiplication. I am trying two methods to do this
DECLARE @TEST TABLE
(
PAR_COLUMN
In pure T-SQL LOG
and EXP
operate with the float
type (8 bytes), which has only 15-17 significant digits. Even that last 15th digit can become inaccurate if you sum large enough values. Your data is numeric(22,6)
, so 15 significant digits is not enough.
POWER
can return numeric
type with potentially higher precision, but it is of little use for us, because both LOG
and LOG10
can return only float
anyway.
To demonstrate the problem I'll change the type in your example to numeric(15,0)
and use POWER
instead of EXP
:
DECLARE @TEST TABLE
(
PAR_COLUMN INT,
PERIOD INT,
VALUE NUMERIC(15, 0)
);
INSERT INTO @TEST VALUES
(1,601,10 ),
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600);
SELECT *,
POWER(CAST(10 AS numeric(15,0)),
Sum(LOG10(
Abs(NULLIF(VALUE, 0))
))
OVER(PARTITION BY PAR_COLUMN ORDER BY PERIOD)) AS Mul
FROM @TEST;
Result
+------------+--------+-------+-----------------+
| PAR_COLUMN | PERIOD | VALUE | Mul |
+------------+--------+-------+-----------------+
| 1 | 601 | 10 | 10 |
| 1 | 602 | 20 | 200 |
| 1 | 603 | 30 | 6000 |
| 1 | 604 | 40 | 240000 |
| 1 | 605 | 50 | 12000000 |
| 1 | 606 | 60 | 720000000 |
| 2 | 601 | 100 | 100 |
| 2 | 602 | 200 | 20000 |
| 2 | 603 | 300 | 6000000 |
| 2 | 604 | 400 | 2400000000 |
| 2 | 605 | 500 | 1200000000000 |
| 2 | 606 | 600 | 720000000000001 |
+------------+--------+-------+-----------------+
Each step here looses precision. Calculating LOG looses precision, SUM looses precision, EXP/POWER looses precision. With these built-in functions I don't think you can do much about it.
So, the answer is - use CLR with C# decimal type (not double
), which supports higher precision (28-29 significant digits). Your original SQL type numeric(22,6)
would fit into it. And you wouldn't need the trick with LOG/EXP
.
Oops. I tried to make a CLR aggregate that calculates Product. It works in my tests, but only as a simple aggregate, i.e.
This works:
SELECT T.PAR_COLUMN, [dbo].[Product](T.VALUE) AS P
FROM @TEST AS T
GROUP BY T.PAR_COLUMN;
And even OVER (PARTITION BY)
works:
SELECT *,
[dbo].[Product](T.VALUE)
OVER (PARTITION BY PAR_COLUMN) AS P
FROM @TEST AS T;
But, running product using OVER (PARTITION BY ... ORDER BY ...)
doesn't work (checked with SQL Server 2014 Express 12.0.2000.8):
SELECT *,
[dbo].[Product](T.VALUE)
OVER (PARTITION BY T.PAR_COLUMN ORDER BY T.PERIOD
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUM_MUL
FROM @TEST AS T;
Incorrect syntax near the keyword 'ORDER'.
A search found this connect item, which is closed as "Won't Fix" and this question.
The C# code:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Collections.Generic;
using System.Text;
namespace RunningProduct
{
[Serializable]
[SqlUserDefinedAggregate(
Format.UserDefined,
MaxByteSize = 17,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
IsNullIfEmpty = true)]
public struct Product : IBinarySerialize
{
private bool m_bIsNull; // 1 byte storage
private decimal m_Product; // 16 bytes storage
public void Init()
{
this.m_bIsNull = true;
this.m_Product = 1;
}
public void Accumulate(
[SqlFacet(Precision = 22, Scale = 6)] SqlDecimal ParamValue)
{
if (ParamValue.IsNull) return;
this.m_bIsNull = false;
this.m_Product *= ParamValue.Value;
}
public void Merge(Product other)
{
SqlDecimal otherValue = other.Terminate();
this.Accumulate(otherValue);
}
[return: SqlFacet(Precision = 22, Scale = 6)]
public SqlDecimal Terminate()
{
if (m_bIsNull)
{
return SqlDecimal.Null;
}
else
{
return m_Product;
}
}
public void Read(BinaryReader r)
{
this.m_bIsNull = r.ReadBoolean();
this.m_Product = r.ReadDecimal();
}
public void Write(BinaryWriter w)
{
w.Write(this.m_bIsNull);
w.Write(this.m_Product);
}
}
}
Install CLR assembly:
-- Turn advanced options on
EXEC sys.sp_configure @configname = 'show advanced options', @configvalue = 1 ;
GO
RECONFIGURE WITH OVERRIDE ;
GO
-- Enable CLR
EXEC sys.sp_configure @configname = 'clr enabled', @configvalue = 1 ;
GO
RECONFIGURE WITH OVERRIDE ;
GO
CREATE ASSEMBLY [RunningProduct]
AUTHORIZATION [dbo]
FROM 'C:\RunningProduct\RunningProduct.dll'
WITH PERMISSION_SET = SAFE;
GO
CREATE AGGREGATE [dbo].[Product](@ParamValue numeric(22,6))
RETURNS numeric(22,6)
EXTERNAL NAME [RunningProduct].[RunningProduct.Product];
GO
This question discusses calculation of a running SUM in great details and Paul White shows in his answer how to write a CLR function that calculates running SUM efficiently. It would be a good start for writing a function that calculates running Product.
Note, that he uses a different approach. Instead of making a custom aggregate function, Paul makes a function that returns a table. The function reads the original data into memory and performs all required calculations.
It may be easier to achieve the desired effect by implementing these calculations on your client side using the programming language of your choice. Just read the whole table and calculate running product on the client. Creating CLR function makes sense if the running product calculated on the server is an intermediary step in a more complex calculations that would aggregate data further.
One more idea that comes to mind.
Find a third-party .NET math library that offers Log
and Exp
functions with high precision. Make a CLR version of these scalar functions. And then use the EXP + LOG + SUM() Over (Order by)
approach, where SUM
is the built-in T-SQL function, which supports Over (Order by)
and Exp
and Log
are custom CLR functions that return not float
, but high-precision decimal
.
Note, that high precision calculations may also be slow. And using CLR scalar functions in the query may also make it slow.