问题
I tried finding special characters using generic regex attributes and NOT LIKE
clause but have been getting confusing results. The research suggested that it does not work the way it works in SQL Server or elsewhere.
- For finding if there is any character
- For finding if there is any number
- For finding if there is any special character
like '%[^0-9]%'
or '%[^a-Z]%'
does not work very well when finding if non-numeric data is available and if non-alphabetical data is present, respectively
SELECT column1 from some_table where column1 like '%[^0-9]%';
SELECT column1 from some_table where column1 like '%[^a-Z]%'
SELECT column1 from some_table where column1 like '%[^a-Z0-9]%'
Have also noted that people use -> NOT like '%[^0-9]%'
回答1:
Postgres LIKE does not support regular expressions.
You need the regular expression operator ~.
Standard SQL also defines SIMILAR TO as an odd mix of the above, but rather don't use that. See:
- Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
For finding if there is any character
... meaning any character at all:
... WHERE col <> ''; -- any character at all?
So neither NULL nor empty. See:
- Best way to check for "empty or null value"
... meaning any alphabetic character (letter):
... WHERE col ~ '[[:alpha:]]'; -- any letters?
[[:alpha:]]
is the character class for all alphabetic characters - not just the ASCII letters [A-Za-z]
, includes letters like [ÄéÒçòý]
etc.
For finding if there is any number
... meaning any digit:
... WHERE col ~ '\d'; -- any digits?
\d
is the class shorthand for [[:digit:]]
.
For finding if there is any special character
... meaning anything except digits and letters:
... WHERE col ~ '\W'; -- anything but digits & letters?
\W
is the class shorthand for [^[:alnum:]_]
(underscore excluded - the manual is currently confusing there).
... meaning anything except digits, letters and plain space:
... WHERE col ~ '[^[:alnum:]_ ]' -- ... and space
That's the class shorthand \W
spelled out, additionally excluding plain space.
... meaning anything except digits, letters and any white space:
... WHERE col ~ '[^[:alnum:]_\s]' -- ... and any white space
... WHERE col ~ '[^[:alnum:]_[:space:]]' -- ... the same spelled out
This time excluding all white space as defined by the Posix character class space. About "white space" in Unicode:
- Trim trailing spaces with PostgreSQL
... meaning any non-ASCII character:
If your DB cluster runs with UTF8 encoding, there is a simple, very fast hack:
... WHERE octet_length(col) > length(col); -- any non-ASCII letter?
octet_length()
counts the bytes in the string, while length()
(aliases: character_length()
or char_length()
) counts characters in the string. All basic ASCII characters ([\x00-\x7F]
) are encoded with 1 byte in UTF-8, all other characters use 2 - 4 bytes. Any non-ASCII character in the string makes the expression true
.
Further reading:
Chapter Regular Expression Class-shorthand Escapes in the manual.
PostgreSQL 9.1 using collate in select statements
- ERROR: “sql ” is not a known variable
回答2:
The problem is that you are using LIKE
incorrectly. These patterns are not recognized by LIKE
.
Use ~
for regular expression matching:
select column1 from some_table where column1 ~ '[^a-Z0-9]'
or more aptly:
select column1 from some_table where column1 ~ '[^a-zA-Z0-9]'
This will return any column that has a character not specified in the character class.
Here is a db<>fiddle.
来源:https://stackoverflow.com/questions/56029995/test-column-for-special-characters-or-only-characters-numbers