Rounding issue in LOG and EXP functions

后端 未结 3 1446
花落未央
花落未央 2020-12-16 21:31

I am trying to perform cumulative multiplication. I am trying two methods to do this

sample data:

DECLARE @TEST TABLE
  (
     PAR_COLUMN         


        
相关标签:
3条回答
  • 2020-12-16 21:49

    In pure T-SQL LOG and EXP operate with the float type (8 bytes), which has only 15-17 significant digits. Even that last 15th digit can become inaccurate if you sum large enough values. Your data is numeric(22,6), so 15 significant digits is not enough.

    POWER can return numeric type with potentially higher precision, but it is of little use for us, because both LOG and LOG10 can return only float anyway.

    To demonstrate the problem I'll change the type in your example to numeric(15,0) and use POWER instead of EXP:

    DECLARE @TEST TABLE
      (
         PAR_COLUMN INT,
         PERIOD     INT,
         VALUE      NUMERIC(15, 0)
      );
    
    INSERT INTO @TEST VALUES 
    (1,601,10 ),
    (1,602,20 ),
    (1,603,30 ),
    (1,604,40 ),
    (1,605,50 ),
    (1,606,60 ),
    (2,601,100),
    (2,602,200),
    (2,603,300),
    (2,604,400),
    (2,605,500),
    (2,606,600);
    
    SELECT *,
        POWER(CAST(10 AS numeric(15,0)),
            Sum(LOG10(
                Abs(NULLIF(VALUE, 0))
                ))
            OVER(PARTITION BY PAR_COLUMN ORDER BY PERIOD)) AS Mul
    FROM @TEST;
    

    Result

    +------------+--------+-------+-----------------+
    | PAR_COLUMN | PERIOD | VALUE |       Mul       |
    +------------+--------+-------+-----------------+
    |          1 |    601 |    10 |              10 |
    |          1 |    602 |    20 |             200 |
    |          1 |    603 |    30 |            6000 |
    |          1 |    604 |    40 |          240000 |
    |          1 |    605 |    50 |        12000000 |
    |          1 |    606 |    60 |       720000000 |
    |          2 |    601 |   100 |             100 |
    |          2 |    602 |   200 |           20000 |
    |          2 |    603 |   300 |         6000000 |
    |          2 |    604 |   400 |      2400000000 |
    |          2 |    605 |   500 |   1200000000000 |
    |          2 |    606 |   600 | 720000000000001 |
    +------------+--------+-------+-----------------+
    

    Each step here looses precision. Calculating LOG looses precision, SUM looses precision, EXP/POWER looses precision. With these built-in functions I don't think you can do much about it.


    So, the answer is - use CLR with C# decimal type (not double), which supports higher precision (28-29 significant digits). Your original SQL type numeric(22,6) would fit into it. And you wouldn't need the trick with LOG/EXP.


    Oops. I tried to make a CLR aggregate that calculates Product. It works in my tests, but only as a simple aggregate, i.e.

    This works:

    SELECT T.PAR_COLUMN, [dbo].[Product](T.VALUE) AS P
    FROM @TEST AS T
    GROUP BY T.PAR_COLUMN;
    

    And even OVER (PARTITION BY) works:

    SELECT *,
        [dbo].[Product](T.VALUE) 
        OVER (PARTITION BY PAR_COLUMN) AS P
    FROM @TEST AS T;
    

    But, running product using OVER (PARTITION BY ... ORDER BY ...) doesn't work (checked with SQL Server 2014 Express 12.0.2000.8):

    SELECT *,
        [dbo].[Product](T.VALUE) 
        OVER (PARTITION BY T.PAR_COLUMN ORDER BY T.PERIOD 
              ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUM_MUL
    FROM @TEST AS T;
    

    Incorrect syntax near the keyword 'ORDER'.

    A search found this connect item, which is closed as "Won't Fix" and this question.


    The C# code:

    using System;
    using System.Data;
    using System.Data.SqlClient;
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    using System.IO;
    using System.Collections.Generic;
    using System.Text;
    
    namespace RunningProduct
    {
        [Serializable]
        [SqlUserDefinedAggregate(
            Format.UserDefined,
            MaxByteSize = 17,
            IsInvariantToNulls = true,
            IsInvariantToDuplicates = false,
            IsInvariantToOrder = true,
            IsNullIfEmpty = true)]
        public struct Product : IBinarySerialize
        {
            private bool m_bIsNull; // 1 byte storage
            private decimal m_Product; // 16 bytes storage
    
            public void Init()
            {
                this.m_bIsNull = true;
                this.m_Product = 1;
            }
    
            public void Accumulate(
                [SqlFacet(Precision = 22, Scale = 6)] SqlDecimal ParamValue)
            {
                if (ParamValue.IsNull) return;
    
                this.m_bIsNull = false;
                this.m_Product *= ParamValue.Value;
            }
    
            public void Merge(Product other)
            {
                SqlDecimal otherValue = other.Terminate();
                this.Accumulate(otherValue);
            }
    
            [return: SqlFacet(Precision = 22, Scale = 6)]
            public SqlDecimal Terminate()
            {
                if (m_bIsNull)
                {
                    return SqlDecimal.Null;
                }
                else
                {
                    return m_Product;
                }
            }
    
            public void Read(BinaryReader r)
            {
                this.m_bIsNull = r.ReadBoolean();
                this.m_Product = r.ReadDecimal();
            }
    
            public void Write(BinaryWriter w)
            {
                w.Write(this.m_bIsNull);
                w.Write(this.m_Product);
            }
        }
    }
    

    Install CLR assembly:

    -- Turn advanced options on
    EXEC sys.sp_configure @configname = 'show advanced options', @configvalue = 1 ;
    GO
    RECONFIGURE WITH OVERRIDE ;
    GO
    -- Enable CLR
    EXEC sys.sp_configure @configname = 'clr enabled', @configvalue = 1 ;
    GO
    RECONFIGURE WITH OVERRIDE ;
    GO
    
    CREATE ASSEMBLY [RunningProduct]
    AUTHORIZATION [dbo]
    FROM 'C:\RunningProduct\RunningProduct.dll'
    WITH PERMISSION_SET = SAFE;
    GO
    
    CREATE AGGREGATE [dbo].[Product](@ParamValue numeric(22,6))
    RETURNS numeric(22,6)
    EXTERNAL NAME [RunningProduct].[RunningProduct.Product];
    GO
    

    This question discusses calculation of a running SUM in great details and Paul White shows in his answer how to write a CLR function that calculates running SUM efficiently. It would be a good start for writing a function that calculates running Product.

    Note, that he uses a different approach. Instead of making a custom aggregate function, Paul makes a function that returns a table. The function reads the original data into memory and performs all required calculations.

    It may be easier to achieve the desired effect by implementing these calculations on your client side using the programming language of your choice. Just read the whole table and calculate running product on the client. Creating CLR function makes sense if the running product calculated on the server is an intermediary step in a more complex calculations that would aggregate data further.


    One more idea that comes to mind.

    Find a third-party .NET math library that offers Log and Exp functions with high precision. Make a CLR version of these scalar functions. And then use the EXP + LOG + SUM() Over (Order by) approach, where SUM is the built-in T-SQL function, which supports Over (Order by) and Exp and Log are custom CLR functions that return not float, but high-precision decimal.

    Note, that high precision calculations may also be slow. And using CLR scalar functions in the query may also make it slow.

    0 讨论(0)
  • 2020-12-16 22:00

    You can round to big multiple, for your data:

    --720000000000000 must be multiple of 600
    
    select
       round( 719999999999998/600,  0 ) * 600
    
    --result: 720000000000000
    

    Test it at SQLFiddle

    create TABLE T 
      (
         PAR_COLUMN INT,
         PERIOD     INT,
         VALUE      NUMERIC(22, 6)
      ) 
    INSERT INTO T VALUES 
    (1,601,10.1 ),    --<--- I put decimals just to test!
    (1,602,20 ),
    (1,603,30 ),
    (1,604,40 ),
    (1,605,50 ),
    (1,606,60 ),
    (2,601,100),
    (2,602,200),
    (2,603,300),
    (2,604,400),
    (2,605,500),
    (2,606,600)
    

    Query 1:

    with T1 as (
    SELECT *,
           Exp(Sum(Log(Abs(NULLIF(VALUE, 0))))
                 OVER(
                   PARTITION BY PAR_COLUMN
                   ORDER BY PERIOD)) AS CUM_MUL,
           VALUE AS CUM_MAX1,
           LAG( VALUE , 1, 1.) 
                 OVER(
                   PARTITION BY PAR_COLUMN
                   ORDER BY PERIOD ) AS CUM_MAX2,
           LAG( VALUE , 2, 1.) 
                 OVER(
                   PARTITION BY PAR_COLUMN
                   ORDER BY PERIOD ) AS CUM_MAX3
    FROM   T )
    select PAR_COLUMN,  PERIOD,  VALUE, 
           ( round( ( CUM_MUL  / ( CUM_MAX1 * CUM_MAX2 * CUM_MAX3) ) ,6) 
             * 
             cast( ( 1000000 * CUM_MAX1 * CUM_MAX2 * CUM_MAX3) as bigint )
           ) / 1000000.
           as CUM_MUL
    FROM T1
    

    Results:

    | PAR_COLUMN | PERIOD | VALUE |         CUM_MUL |
    |------------|--------|-------|-----------------|
    |          1 |    601 |  10.1 |            10.1 | --ok! because my data
    |          1 |    602 |    20 |             202 |
    |          1 |    603 |    30 |            6060 |
    |          1 |    604 |    40 |          242400 |
    |          1 |    605 |    50 |        12120000 |
    |          1 |    606 |    60 |       727200000 |
    |          2 |    601 |   100 |             100 |
    |          2 |    602 |   200 |           20000 |
    |          2 |    603 |   300 |         6000000 |
    |          2 |    604 |   400 |      2400000000 |
    |          2 |    605 |   500 |   1200000000000 |
    |          2 |    606 |   600 | 720000000000000 |
    

    Notice I x1000000 to work without decimals

    0 讨论(0)
  • 2020-12-16 22:04

    LOG() and EXP() implicitly convert arguments to the float data type, which are approximate values.

    0 讨论(0)
提交回复
热议问题