Test column for special characters or only characters / numbers

♀尐吖头ヾ 提交于 2021-01-24 14:10:31

问题


I tried finding special characters using generic regex attributes and NOT LIKE clause but have been getting confusing results. The research suggested that it does not work the way it works in SQL Server or elsewhere.

  1. For finding if there is any character
  2. For finding if there is any number
  3. For finding if there is any special character

like '%[^0-9]%' or '%[^a-Z]%' does not work very well when finding if non-numeric data is available and if non-alphabetical data is present, respectively

SELECT column1 from some_table where column1 like '%[^0-9]%'; 
SELECT column1 from some_table where column1 like '%[^a-Z]%' 
SELECT column1 from some_table where column1 like '%[^a-Z0-9]%' 

Have also noted that people use -> NOT like '%[^0-9]%'


回答1:


Postgres LIKE does not support regular expressions.
You need the regular expression operator ~.

Standard SQL also defines SIMILAR TO as an odd mix of the above, but rather don't use that. See:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

For finding if there is any character

... meaning any character at all:

... WHERE col <> '';                        -- any character at all?

So neither NULL nor empty. See:

  • Best way to check for "empty or null value"

... meaning any alphabetic character (letter):

... WHERE col ~ '[[:alpha:]]';              -- any letters?

[[:alpha:]] is the character class for all alphabetic characters - not just the ASCII letters [A-Za-z], includes letters like [ÄéÒçòý] etc.

For finding if there is any number

... meaning any digit:

... WHERE col ~ '\d';                       -- any digits?

\d is the class shorthand for [[:digit:]].

For finding if there is any special character

... meaning anything except digits and letters:

... WHERE col ~ '\W';                       -- anything but digits & letters? 

\W is the class shorthand for [^[:alnum:]_] (underscore excluded - the manual is currently confusing there).

... meaning anything except digits, letters and plain space:

... WHERE col ~ '[^[:alnum:]_ ]'            -- ... and space

That's the class shorthand \W spelled out, additionally excluding plain space.

... meaning anything except digits, letters and any white space:

... WHERE col ~ '[^[:alnum:]_\s]'           -- ... and any white space
... WHERE col ~ '[^[:alnum:]_[:space:]]'    -- ... the same spelled out

This time excluding all white space as defined by the Posix character class space. About "white space" in Unicode:

  • Trim trailing spaces with PostgreSQL

... meaning any non-ASCII character:

If your DB cluster runs with UTF8 encoding, there is a simple, very fast hack:

... WHERE octet_length(col) > length(col);  -- any non-ASCII letter?

octet_length()counts the bytes in the string, while length() (aliases: character_length() or char_length()) counts characters in the string. All basic ASCII characters ([\x00-\x7F]) are encoded with 1 byte in UTF-8, all other characters use 2 - 4 bytes. Any non-ASCII character in the string makes the expression true.

Further reading:

  • Chapter Regular Expression Class-shorthand Escapes in the manual.

  • PostgreSQL 9.1 using collate in select statements

  • ERROR: “sql ” is not a known variable



回答2:


The problem is that you are using LIKE incorrectly. These patterns are not recognized by LIKE.

Use ~ for regular expression matching:

select column1 from some_table where column1 ~ '[^a-Z0-9]' 

or more aptly:

select column1 from some_table where column1 ~ '[^a-zA-Z0-9]'

This will return any column that has a character not specified in the character class.

Here is a db<>fiddle.



来源:https://stackoverflow.com/questions/56029995/test-column-for-special-characters-or-only-characters-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!