问题
So, I have a column of data, to use a previous example, body temperatures, that is stored as varchar so that no record gets rejected, however, it contains numeric data.
The people sending me the data are using a less than perfect system, so I have some incorrect data. What I need to do is write a SQL query to find valid values above or below a certain value.
For example, all temps over 104, which should indicate either extreme cases or errors.
I tried:
select count(1), result_num from VITALS where test_cd is 'TEMP' and cast(result_num as integer) > 104 group by result_num;
This returned an invalid number error, so I figured I had characters on some rows that couldn't convert to integers and I found to records with negative values ("-" before the number) and some that said "NULL", so I amended my query to read:
select count(1), result_num from VITALS where test_cd is 'TEMP' **and result_num not like '%-%' and result_num not like '%NULL%'** and cast(result_num as integer) > 104 group by result_num;
...and it still returned an invalid number error. I have triple checked the data in my RESULT_NUM field and those are the only character responses.
All other responses, whether legit temps or not, are numeric with no characters other than decimals.
Do I need to link the "not like" statements in parens or something?
This is probably a simple answer, but it is driving me nuts.
回答1:
You could filter out the non-numeric values with a function like this answer provides, or with a regular expression - which might need some tweaking:
select count(1), result_num
from vitals
where test_cd = 'TEMP'
and regexp_like(result_num, '^[-]?[0-9]*[\.]?[0-9]*$')
and cast(result_num as integer) > 104
group by result_num;
SQL Fiddle.
That will exclude most non-numbers (maybe all, but I'm not that confident - regex isn't a strong area), though Justin's function is probably safer.
However, there's still no guarantee that the filter function will be applied before the cast. If this still trips up then you could use a subquery to filter out non-numeric values and then check the actual value of those that remain; but you'd probably need to add a hint to stop Oracle unnesting the subquery and changing the evaluation order on you.
Another approach is a variation of Justin's function that returns the actual number:
CREATE OR REPLACE FUNCTION safe_number( p_str IN VARCHAR2 )
RETURN NUMBER DETERMINISTIC PARALLEL_ENABLE
IS
l_num NUMBER;
BEGIN
l_num := to_number( p_str );
RETURN l_num;
EXCEPTION
WHEN value_error THEN
RETURN null;
END safe_number;
/
Then your query can use that:
select count(1), result_num
from vitals
where test_cd = 'TEMP'
and safe_number(result_num) > 104
group by result_num;
SQL Fiddle.
回答2:
This worked for me. It's essentially the suggestion that @Alex made about the subquery. Hope it works on your source data:
SELECT count(*), result_num
FROM
(
SELECT
test_cd,
CASE WHEN REGEXP_LIKE(result_num,'^-?[0-9]*\.?[0-9]*$')
THEN result_num - 0
ELSE NULL
END result_num
FROM vitals
)
WHERE test_cd='TEMP'
AND result_num > 104
GROUP BY result_num;
Another different idea: If you're on Oracle 11g, and if you have the ability to suggest/make table structure changes, I might like the idea of adding a Virtual Column (i.e. a computed column) to the table which would compute the sanitized numeric value. The behavior would be similar to a view, but two big advantages by going with a Virtual Column: (1) no additional objects, still one object, the vitals
table, and (2) you can index on a Virtual Column (logically the same as a functional index).
If you're not on 11g, or if Virtual Columns sounds like too much alchemy, an alternative would be to make a plain old column to hold the sanitized values and have a trigger compute its value on insert/update.
回答3:
EDIT: If you are using a version later than Oracle 10g, you can use a regular expression to look for only numeric characters in result_num. Since decimals and negative numbers are to be filtered out, you can simply look for characters 0-9, as follows:
select result_num , count(1) as cnt_result_num
from VITALS
where test_cd = 'TEMP'
and result_num IS NOT NULL
and regexp_like(result_num, '^[0-9]*$')
and cast(result_num as integer) > 104
group by result_num;
SQL Fiddle
Reference:
REGEXP_LIKE on Oracle Database SQL Reference
来源:https://stackoverflow.com/questions/24290885/invalid-numbers