So I have a query that requires a bunch of CASE statements in the SELECT. This was not the orginal design but part of a compromise.
So the query looks something like thi
Edit: Actually, you can re-factor both of those sub-queries into a JOIN
, which would probably be faster, anyway. It gets rid of a lot of repetition, too!
This really isn't about the performance of the query ( @Gordon has that covered pretty well), but that huge case statement just seems like a maintenance nightmare. Maybe a better way to handle that would be to convert it to a table
CREATE TABLE ACCT_DISPLAY_NAME (
FINC_ACCT_ID CHAR(10),
BAL_TYPE_CD CHAR(3),
DISPLAY_NAME VARCHAR(100)
);
CREATE INDEX ACCT_DISPLAY_INDEX ON ACCT_DISPLAY_NAME (
FINC_ACCT_ID,
BAL_TYPE_CD
);
INSERT INTO ACCT_DISPLAY_NAME VALUES
('AC99800' , 'EOP', ' Net Interest Income' ),
('AC12993' , 'EOP', ' Non Interest Income' ),
('AC667999' , 'EOP', 'Non-Interest Expense' ),
('AC996587' , 'EOP', ' Total Marketing Expense' ),
('AC659986' , 'EOP', ' Total Operating Expense' ),
('AC69678' , 'EOP', 'Pre-Provision Earnings (before tax)' ),
('AC09994' , 'EOP', ' Net Charge-offs' ),
('AC20977' , 'EOP', ' Other' ),
('AC19979' , 'EOP', ' Allowance Build (Release)' ),
('AC7094' , 'EOP', 'Provision Expense' ),
('AC6997' , 'EOP', 'Pretax Income' ),
('AC0994' , 'EOP', 'Tax Expense' ),
('AC9999' , 'EOP', 'NIAT' ),
('AC7990' , 'EOP', 'EPS' ),
('AC9995' , 'EOP', 'Ending Loans - HFI' ),
('AC9995' , 'avg', 'Average Loans - HFI' ),
('AC2991' , 'avg', 'Average Earning Assets' ),
('AC2999' , 'EOP', 'Ending Deposits' ),
('AC9999' , 'avg', 'Average Deposits' ),
('AC0379' , 'EOP', 'NIM on Loans' ),
('AC6999' , 'EOP', 'Revenue Margin' ),
('AC579' , 'EOP', 'Charge off rate' ),
('AC5899' , 'EOP', 'Efficiency ratio' ),
('AC629' , 'EOP', 'ROA' ),
('AC359' , 'EOP', 'ROE' ),
('AC619' , 'EOP', 'Return on Allocated Capital (ROAC)' );
And do a LEFT JOIN
on it (since you have that ELSE
in the CASE
), something like:
SELECT T.FINC_ACCT_NM,
T.FINC_ACCT_ID,
T.CURR_END_OF_PERD_ACTL_VAL,
T.PREV_END_OF_PERD_ACTL_VAL,
T.VARNC_PLAN_VAL,
T.OUTLOOK_BDGT_PLAN_VAL,
T.PERD_END_RPT_DT,
T.PLAN_VERS_NM,
T.FRMT_ACTL_CD,
T.FRMT_PLAN_CD,
T.RPT_PERD_TYPE_CD,
COALESCE(N.DISPLAY_NAME, T.FINC_ACCT_NM)
FROM CONT.TABLE T
JOIN (
SELECT RPT_PERD_TYPE_CD, DATA_VLDTN_IND, Max(Perd_END_RPT_DT) AS PERD_END_RPT_DT
FROM CONT.TABLE
WHERE VERS_NM='Actual'
AND DATA_VLDTN_IND='Y'
GROUP BY RPT_PERD_TYPE_CD, DATA_VLDTN_IND
) AS MAX_DATES
ON T.RPT_PERD_TYPE_CD = MAX_DATES.RPT_PERD_TYPE_CD
AND T.DATA_VLDTN_IND = MAX_DATES.DATA_VLDTN_IND
AND T.PERD_END_RPT_DT = MAX_DATES.PERD_END_RPT_DT
LEFT JOIN ACCT_DISPLAY_NAME N
ON T.FINC_ACCT_ID = N.FINC_ACCT_ID
AND T.BAL_TYPE_CD = N.BAL_TYPE_CD
WHERE T.DEPT_ID = 'OR80637'
AND T.RPT_PERD_TYPE_CD IN ('Q', 'M')
AND T.FINC_ACCT_ID IN (
'AC0006470',
'AC8000199',
'AC8002145',
'AC0006586',
'AC8000094'
)
The case statements are going to be much less of a factor than the joins in the WHERE clause.
The main driver of performance in SQL is I/O -- reading the data from disk. I think of it as two orders of magnitude more important than the processing going on in rows. This is just a heuristic, not based on specific tests on a database.
You are doing self-joins, which will require either lots of work reading the table or a fair amount of work dealing with indexes.
The case statement, on the other hand, gets turned into very primitive hardware commands -- equals, gotos, and the like. The data resides in memory closest to the processors, so it is going to zip along. You are doing nothing fancy in the case statement (such as a like or a subquery). I would imagine that the query would be just as fast if you removed most of the lines in the statement.
If you are having issues with performance, put an index on (VERS_NM, RPT_PERD_TYPE_CD, DATA_VLDTN_IND, Perd_END_RPT_DT). This four-part index should allow you to get the max date without invoking I/O requests on the original table.