SQL - What is the performance impact of having multiple CASE statements in SELECT - Teradata

后端 未结 2 858
心在旅途
心在旅途 2021-02-07 02:23

So I have a query that requires a bunch of CASE statements in the SELECT. This was not the orginal design but part of a compromise.

So the query looks something like thi

相关标签:
2条回答
  • 2021-02-07 02:46

    Edit: Actually, you can re-factor both of those sub-queries into a JOIN, which would probably be faster, anyway. It gets rid of a lot of repetition, too!

    This really isn't about the performance of the query ( @Gordon has that covered pretty well), but that huge case statement just seems like a maintenance nightmare. Maybe a better way to handle that would be to convert it to a table

    CREATE TABLE ACCT_DISPLAY_NAME (
        FINC_ACCT_ID CHAR(10),
        BAL_TYPE_CD  CHAR(3),
        DISPLAY_NAME VARCHAR(100)
    );
    
    CREATE INDEX ACCT_DISPLAY_INDEX ON ACCT_DISPLAY_NAME (
        FINC_ACCT_ID,
        BAL_TYPE_CD
    );
    
    INSERT INTO ACCT_DISPLAY_NAME VALUES
    ('AC99800'  , 'EOP', '  Net Interest Income'               ),
    ('AC12993'  , 'EOP', '  Non Interest Income'               ),
    ('AC667999' , 'EOP', 'Non-Interest Expense'                ),
    ('AC996587' , 'EOP', '  Total Marketing Expense'           ),
    ('AC659986' , 'EOP', '  Total Operating Expense'           ),
    ('AC69678'  , 'EOP', 'Pre-Provision Earnings (before tax)' ),
    ('AC09994'  , 'EOP', '  Net Charge-offs'                   ),
    ('AC20977'  , 'EOP', '  Other'                             ),
    ('AC19979'  , 'EOP', '  Allowance Build (Release)'         ),
    ('AC7094'   , 'EOP', 'Provision Expense'                   ),
    ('AC6997'   , 'EOP', 'Pretax Income'                       ),
    ('AC0994'   , 'EOP', 'Tax Expense'                         ),
    ('AC9999'   , 'EOP', 'NIAT'                                ),
    ('AC7990'   , 'EOP', 'EPS'                                 ),
    ('AC9995'   , 'EOP', 'Ending Loans - HFI'                  ),
    ('AC9995'   , 'avg', 'Average Loans - HFI'                 ),
    ('AC2991'   , 'avg', 'Average Earning Assets'              ),
    ('AC2999'   , 'EOP', 'Ending Deposits'                     ),
    ('AC9999'   , 'avg', 'Average Deposits'                    ),
    ('AC0379'   , 'EOP', 'NIM on Loans'                        ),
    ('AC6999'   , 'EOP', 'Revenue Margin'                      ),
    ('AC579'    , 'EOP', 'Charge off rate'                     ),
    ('AC5899'   , 'EOP', 'Efficiency ratio'                    ),
    ('AC629'    , 'EOP', 'ROA'                                 ),
    ('AC359'    , 'EOP', 'ROE'                                 ),
    ('AC619'    , 'EOP', 'Return on Allocated Capital (ROAC)'  );
    

    And do a LEFT JOIN on it (since you have that ELSE in the CASE), something like:

    SELECT T.FINC_ACCT_NM,
           T.FINC_ACCT_ID,
           T.CURR_END_OF_PERD_ACTL_VAL,
           T.PREV_END_OF_PERD_ACTL_VAL,
           T.VARNC_PLAN_VAL,
           T.OUTLOOK_BDGT_PLAN_VAL,
           T.PERD_END_RPT_DT,
           T.PLAN_VERS_NM,
           T.FRMT_ACTL_CD,
           T.FRMT_PLAN_CD,
           T.RPT_PERD_TYPE_CD,
           COALESCE(N.DISPLAY_NAME, T.FINC_ACCT_NM)
    
    FROM CONT.TABLE T
    JOIN (
        SELECT RPT_PERD_TYPE_CD, DATA_VLDTN_IND, Max(Perd_END_RPT_DT) AS PERD_END_RPT_DT
        FROM CONT.TABLE
        WHERE VERS_NM='Actual'
          AND DATA_VLDTN_IND='Y'
        GROUP BY RPT_PERD_TYPE_CD, DATA_VLDTN_IND
    ) AS MAX_DATES
      ON T.RPT_PERD_TYPE_CD = MAX_DATES.RPT_PERD_TYPE_CD
     AND T.DATA_VLDTN_IND   = MAX_DATES.DATA_VLDTN_IND 
     AND T.PERD_END_RPT_DT  = MAX_DATES.PERD_END_RPT_DT 
    
    LEFT JOIN ACCT_DISPLAY_NAME N
      ON T.FINC_ACCT_ID = N.FINC_ACCT_ID
     AND T.BAL_TYPE_CD  = N.BAL_TYPE_CD
    
    WHERE T.DEPT_ID = 'OR80637'
    
      AND T.RPT_PERD_TYPE_CD IN ('Q', 'M')
    
      AND T.FINC_ACCT_ID IN (
        'AC0006470',
        'AC8000199',
        'AC8002145',
        'AC0006586',
        'AC8000094'
      )
    
    0 讨论(0)
  • The case statements are going to be much less of a factor than the joins in the WHERE clause.

    The main driver of performance in SQL is I/O -- reading the data from disk. I think of it as two orders of magnitude more important than the processing going on in rows. This is just a heuristic, not based on specific tests on a database.

    You are doing self-joins, which will require either lots of work reading the table or a fair amount of work dealing with indexes.

    The case statement, on the other hand, gets turned into very primitive hardware commands -- equals, gotos, and the like. The data resides in memory closest to the processors, so it is going to zip along. You are doing nothing fancy in the case statement (such as a like or a subquery). I would imagine that the query would be just as fast if you removed most of the lines in the statement.

    If you are having issues with performance, put an index on (VERS_NM, RPT_PERD_TYPE_CD, DATA_VLDTN_IND, Perd_END_RPT_DT). This four-part index should allow you to get the max date without invoking I/O requests on the original table.

    0 讨论(0)
提交回复
热议问题