Rounding issue in LOG and EXP functions

后端 未结 3 1445
花落未央
花落未央 2020-12-16 21:31

I am trying to perform cumulative multiplication. I am trying two methods to do this

sample data:

DECLARE @TEST TABLE
  (
     PAR_COLUMN         


        
3条回答
  •  囚心锁ツ
    2020-12-16 21:49

    In pure T-SQL LOG and EXP operate with the float type (8 bytes), which has only 15-17 significant digits. Even that last 15th digit can become inaccurate if you sum large enough values. Your data is numeric(22,6), so 15 significant digits is not enough.

    POWER can return numeric type with potentially higher precision, but it is of little use for us, because both LOG and LOG10 can return only float anyway.

    To demonstrate the problem I'll change the type in your example to numeric(15,0) and use POWER instead of EXP:

    DECLARE @TEST TABLE
      (
         PAR_COLUMN INT,
         PERIOD     INT,
         VALUE      NUMERIC(15, 0)
      );
    
    INSERT INTO @TEST VALUES 
    (1,601,10 ),
    (1,602,20 ),
    (1,603,30 ),
    (1,604,40 ),
    (1,605,50 ),
    (1,606,60 ),
    (2,601,100),
    (2,602,200),
    (2,603,300),
    (2,604,400),
    (2,605,500),
    (2,606,600);
    
    SELECT *,
        POWER(CAST(10 AS numeric(15,0)),
            Sum(LOG10(
                Abs(NULLIF(VALUE, 0))
                ))
            OVER(PARTITION BY PAR_COLUMN ORDER BY PERIOD)) AS Mul
    FROM @TEST;
    

    Result

    +------------+--------+-------+-----------------+
    | PAR_COLUMN | PERIOD | VALUE |       Mul       |
    +------------+--------+-------+-----------------+
    |          1 |    601 |    10 |              10 |
    |          1 |    602 |    20 |             200 |
    |          1 |    603 |    30 |            6000 |
    |          1 |    604 |    40 |          240000 |
    |          1 |    605 |    50 |        12000000 |
    |          1 |    606 |    60 |       720000000 |
    |          2 |    601 |   100 |             100 |
    |          2 |    602 |   200 |           20000 |
    |          2 |    603 |   300 |         6000000 |
    |          2 |    604 |   400 |      2400000000 |
    |          2 |    605 |   500 |   1200000000000 |
    |          2 |    606 |   600 | 720000000000001 |
    +------------+--------+-------+-----------------+
    

    Each step here looses precision. Calculating LOG looses precision, SUM looses precision, EXP/POWER looses precision. With these built-in functions I don't think you can do much about it.


    So, the answer is - use CLR with C# decimal type (not double), which supports higher precision (28-29 significant digits). Your original SQL type numeric(22,6) would fit into it. And you wouldn't need the trick with LOG/EXP.


    Oops. I tried to make a CLR aggregate that calculates Product. It works in my tests, but only as a simple aggregate, i.e.

    This works:

    SELECT T.PAR_COLUMN, [dbo].[Product](T.VALUE) AS P
    FROM @TEST AS T
    GROUP BY T.PAR_COLUMN;
    

    And even OVER (PARTITION BY) works:

    SELECT *,
        [dbo].[Product](T.VALUE) 
        OVER (PARTITION BY PAR_COLUMN) AS P
    FROM @TEST AS T;
    

    But, running product using OVER (PARTITION BY ... ORDER BY ...) doesn't work (checked with SQL Server 2014 Express 12.0.2000.8):

    SELECT *,
        [dbo].[Product](T.VALUE) 
        OVER (PARTITION BY T.PAR_COLUMN ORDER BY T.PERIOD 
              ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUM_MUL
    FROM @TEST AS T;
    

    Incorrect syntax near the keyword 'ORDER'.

    A search found this connect item, which is closed as "Won't Fix" and this question.


    The C# code:

    using System;
    using System.Data;
    using System.Data.SqlClient;
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    using System.IO;
    using System.Collections.Generic;
    using System.Text;
    
    namespace RunningProduct
    {
        [Serializable]
        [SqlUserDefinedAggregate(
            Format.UserDefined,
            MaxByteSize = 17,
            IsInvariantToNulls = true,
            IsInvariantToDuplicates = false,
            IsInvariantToOrder = true,
            IsNullIfEmpty = true)]
        public struct Product : IBinarySerialize
        {
            private bool m_bIsNull; // 1 byte storage
            private decimal m_Product; // 16 bytes storage
    
            public void Init()
            {
                this.m_bIsNull = true;
                this.m_Product = 1;
            }
    
            public void Accumulate(
                [SqlFacet(Precision = 22, Scale = 6)] SqlDecimal ParamValue)
            {
                if (ParamValue.IsNull) return;
    
                this.m_bIsNull = false;
                this.m_Product *= ParamValue.Value;
            }
    
            public void Merge(Product other)
            {
                SqlDecimal otherValue = other.Terminate();
                this.Accumulate(otherValue);
            }
    
            [return: SqlFacet(Precision = 22, Scale = 6)]
            public SqlDecimal Terminate()
            {
                if (m_bIsNull)
                {
                    return SqlDecimal.Null;
                }
                else
                {
                    return m_Product;
                }
            }
    
            public void Read(BinaryReader r)
            {
                this.m_bIsNull = r.ReadBoolean();
                this.m_Product = r.ReadDecimal();
            }
    
            public void Write(BinaryWriter w)
            {
                w.Write(this.m_bIsNull);
                w.Write(this.m_Product);
            }
        }
    }
    

    Install CLR assembly:

    -- Turn advanced options on
    EXEC sys.sp_configure @configname = 'show advanced options', @configvalue = 1 ;
    GO
    RECONFIGURE WITH OVERRIDE ;
    GO
    -- Enable CLR
    EXEC sys.sp_configure @configname = 'clr enabled', @configvalue = 1 ;
    GO
    RECONFIGURE WITH OVERRIDE ;
    GO
    
    CREATE ASSEMBLY [RunningProduct]
    AUTHORIZATION [dbo]
    FROM 'C:\RunningProduct\RunningProduct.dll'
    WITH PERMISSION_SET = SAFE;
    GO
    
    CREATE AGGREGATE [dbo].[Product](@ParamValue numeric(22,6))
    RETURNS numeric(22,6)
    EXTERNAL NAME [RunningProduct].[RunningProduct.Product];
    GO
    

    This question discusses calculation of a running SUM in great details and Paul White shows in his answer how to write a CLR function that calculates running SUM efficiently. It would be a good start for writing a function that calculates running Product.

    Note, that he uses a different approach. Instead of making a custom aggregate function, Paul makes a function that returns a table. The function reads the original data into memory and performs all required calculations.

    It may be easier to achieve the desired effect by implementing these calculations on your client side using the programming language of your choice. Just read the whole table and calculate running product on the client. Creating CLR function makes sense if the running product calculated on the server is an intermediary step in a more complex calculations that would aggregate data further.


    One more idea that comes to mind.

    Find a third-party .NET math library that offers Log and Exp functions with high precision. Make a CLR version of these scalar functions. And then use the EXP + LOG + SUM() Over (Order by) approach, where SUM is the built-in T-SQL function, which supports Over (Order by) and Exp and Log are custom CLR functions that return not float, but high-precision decimal.

    Note, that high precision calculations may also be slow. And using CLR scalar functions in the query may also make it slow.

提交回复
热议问题