Oracle REGEXP_INSTR() and “a-z” character range doesn't match as expected

后端 未结 2 1753
说谎
说谎 2021-01-22 08:04

I want to use REGEXP_INSTR() within an oracle database to check for lower/uppercase characters. I\'m aware of [:upper:] and [:lower:] POSI

2条回答
  •  礼貌的吻别
    2021-01-22 08:47

    Okay, the answer that NLS_SORT causes this behavior is correct, but I don't think it explains it in an understandable way. None of the documentation I found actually does that...

    You have to imagine that the character ranges defined by [a-z] are actually derived from a single substring of all possible characters which are sorted depending on NLS_SORT.

    Lets assume the whole alphabet is just alphanumerical characters. Sorted by BINARY this results in a base string like 0123456789abcdefgh...xyzABCDE...XYZ. Derived from this, [0-6] expands to [0123456], [a-f] to [abcdef], [5-b] to [56789ab] etc.

    Sorted by a linguistic_definition however results in a different base string, like 0123456789aAbBcCdDeF...xXyYzZ. Derived from this, [0-6] still expands to [0123456], but [a-f] now expands to [aAbBcCdDeEf] and [5-b] to [56789aAb] etc...

    This is why a did not match [A-Z], but b did. [A-Z] actually expands to [AbBcC...yYzZ] which includes z but not a.

    In reality [A-Z] might even contain more characters, like [aAàáâÀÁÂ...] etc.

提交回复
热议问题