How to regexp_replace for Unicode in PostgreSQL
i read this http://www.regular-expressions.info/unicode.html
select regexp_replace(\'s4y8sds\', \'\\\
For ordinary numbers use digit
character class as [[:digit:]]
or shorthand \d
:
SELECT regexp_replace('s4y8sds', $$\d+$$, '', 'g');
Result:
regexp_replace
----------------
sysds
(1 row)
For other numbers (for example ¼) is not that simple, more precisely as documentation says it's ctype (locale) dependent:
Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others.
However you could use internal PL/Perl procedural language and write server-side function with wanted Unicode characters classes \p{}
:
CREATE OR REPLACE FUNCTION removeNumbersUnicode(text)
RETURNS text AS $$
$s = $_[0];
$s =~ s/\p{N}//g;
return $s;
$$ LANGUAGE plperl;
Check Chapter 41 from doc for more info how to write such functions.