问题
I have this code (was already there, isn't mine):
SELECT
a.id_original_contrato AS contrato,
( CASE WHEN d.value~'^\\d+$' THEN d.value::integer ELSE 0 END ) AS monto,
EXTRACT(YEAR FROM b.value)::integer AS anoinicio,
EXTRACT(YEAR FROM c.value)::integer AS anofin
... etc (some JOIN's and WHERE's)
Let me explain: d.value
comes from a table where value is character varying (200)
. The code will insert later the d.value
(now called 'monto') in another table as a integer
. Someone coded that regex in order to extract some chars or in other case (ELSE), define it as 0. Those values works when they are integer
only. If I use a d.value like 76.44
it doesn't work due that regex, it always define it as 0.
Well, I have to change that code, because:
- I need to store the d.value in the new table as
numeric
, not asinteger
anymore (In my new table the data type isnumeric
now) - But first, I need to correct that regex, because is messing my numeric numbers like
76.44
, or66,56
(dot or coma).
I'm not sure what that regex is doing. And how could I accomplish what need with a better or new regex?
回答1:
You should declare your version of Postgres and also the version that was used when the code was written (if you know that). The double backslash in \\d
suggests an old version with standard_conforming_strings = off
. The manual:
Beginning in PostgreSQL 9.1, the default is
on
(prior releases defaulted to off).
In modern versions with standard_conforming_strings = on
, this string makes little sense as regular expression: . To detect strings consisting of one or more digits use either '^\\d+$'
E'^\\d+$'
(prefixed with E
) or '^\d+$'
. Details:
- Insert text with single quotes in PostgreSQL
Integer literals also allow an optional leading sign for negative / positive numbers. And leading / dangling white space is also allowed (trimmed automatically) in Postgres.
So, this is the complete regular expression for integer
:
CASE WHEN d.value ~ '^\s*[-+]?\d+\s*$' THEN d.value::int ELSE 0 END
The regular expression explained:
^
.. start of string\s
.. class shorthand for [[:space:]]
(white space)*
.. quantifier for 0 or more times[+-]
.. character class consisting of +
and -
?
.. quantifier for 0 or 1 times\d
.. class shorthand for [[:digit:]]
(digits)+
.. quantifier for 1 or more times\s*
.. same as above$
.. end of string
Now we know the basics. Read more in the manual I linked to. Consider the syntax rules for numeric string literals. And, while the states about legal numeric constants:
There cannot be any spaces or other characters embedded in the constant
That's because a numeric constant is not quoted, hence white space is not possible. Not applicable for casting strings. White space is tolerated: Leading, trailing and right after the exponent char.
So these are all legal strings for the cast to numeric
:
'^\s*[-+]?\d*\.?\d+(?:[eE]\s*[-+]?\d+)?\s*$'
The only new element are parentheses (()) to denote the contained regular expression as atom. Since we are not interested in back references, use "non-capturing": (?:
...)
and append a question mark (?:[eE]\s*[-+]?\d+)?
to mean: the "exponential" part can be added or not, as a whole.
Assuming dot (.
) as decimal separator. You might use comma instead (,
) or [,\.]
to allow either. But only dot is legal for the cast.
Test:
SELECT '|' || txt || '|' As text_with_delim
, txt ~ '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' As test
, txt::numeric AS number
FROM unnest ('{1, 123, 000, " -1 ", +2, 1.2, .34, 5e6, " .5e -6 "}'::text[]) txt;
Result:
text_with_delim | test | number
-----------------+------+-----------
|1| | t | 1
|123| | t | 123
|000| | t | 0
| -1 | | t | -1
|+2| | t | 2
|1.2| | t | 1.2
|.34| | t | 0.34
|5e6| | t | 5000000
| .5e -6 | | t | 0.0000005
Or you might have use for to_number() to convert strings of arbitrary given format.
回答2:
Choose a variant:
with v(value) as (
values
('12,3'),
('12.3'),
('123'),
('123.'),
('.123'),
('1.2.3')
)
select
value,
value ~ '^(\d+[,\.]\d+|\d+)$' as variant_a,
value ~ '^(\d*[,\.]\d*|\d+)$' as variant_b,
value ~ '^\d+[,\.]\d+$' as variant_c
from v;
value | variant_a | variant_b | variant_c
-------+-----------+-----------+-----------
12,3 | t | t | t
12.3 | t | t | t
123 | t | t | f
123. | f | t | f
.123 | f | t | f
1.2.3 | f | f | f
(6 rows)
To convert a string with a dot or a comma to numeric use replace()
:
select replace(value, ',', '.')::numeric;
来源:https://stackoverflow.com/questions/42214084/filter-strings-with-regex-before-casting-to-numeric