I want to use REGEXP_INSTR()
within an oracle database to check for lower/uppercase characters. I\'m aware of [:upper:]
and [:lower:]
POSI
Okay, the answer that NLS_SORT causes this behavior is correct, but I don't think it explains it in an understandable way. None of the documentation I found actually does that...
You have to imagine that the character ranges defined by [a-z]
are actually derived from a single substring of all possible characters which are sorted depending on NLS_SORT
.
Lets assume the whole alphabet is just alphanumerical characters. Sorted by BINARY
this results in a base string like 0123456789abcdefgh...xyzABCDE...XYZ
.
Derived from this, [0-6]
expands to [0123456]
, [a-f]
to [abcdef]
, [5-b]
to [56789ab]
etc.
Sorted by a linguistic_definition
however results in a different base string, like 0123456789aAbBcCdDeF...xXyYzZ
.
Derived from this, [0-6]
still expands to [0123456]
, but [a-f]
now expands to [aAbBcCdDeEf]
and [5-b]
to [56789aAb]
etc...
This is why a
did not match [A-Z]
, but b
did. [A-Z]
actually expands to [AbBcC...yYzZ]
which includes z
but not a
.
In reality [A-Z]
might even contain more characters, like [aAàáâÀÁÂ...]
etc.
The reason for the behavior is the collation rules. See the NLS_SORT documentation:
- If the value is BINARY, then the collating sequence for ORDER BY queries is based on the numeric value of characters (a binary sort that requires less system overhead).
- If the value is a named linguistic sort, sorting is based on the order of the defined linguistic sort. Most (but not all) languages supported by the NLS_LANGUAGE parameter also support a linguistic sort with the same name.
Set the NLS_SORT
to BINARY
so that the [A-Z]
could be parsed in the same order as in the ASCII table,
alter session set nls_sort = 'BINARY'
Then, you will get consistent results.
See the online demo.