发表新帖

发表新帖

Oracle REGEXP_INSTR() and “a-z” character range doesn't match as expected

后端未结

关注

 2  1753

说谎 2021-01-22 08:04

I want to use REGEXP_INSTR() within an oracle database to check for lower/uppercase characters. I\'m aware of [:upper:] and [:lower:] POSI

2条回答

礼貌的吻别 (楼主)

2021-01-22 08:47

Okay, the answer that NLS_SORT causes this behavior is correct, but I don't think it explains it in an understandable way. None of the documentation I found actually does that...

You have to imagine that the character ranges defined by [a-z] are actually derived from a single substring of all possible characters which are sorted depending on NLS_SORT.

Lets assume the whole alphabet is just alphanumerical characters. Sorted by BINARY this results in a base string like 0123456789abcdefgh...xyzABCDE...XYZ. Derived from this, [0-6] expands to [0123456], [a-f] to [abcdef], [5-b] to [56789ab] etc.

Sorted by a linguistic_definition however results in a different base string, like 0123456789aAbBcCdDeF...xXyYzZ. Derived from this, [0-6] still expands to [0123456], but [a-f] now expands to [aAbBcCdDeEf] and [5-b] to [56789aAb] etc...

This is why a did not match [A-Z], but b did. [A-Z] actually expands to [AbBcC...yYzZ] which includes z but not a.

In reality [A-Z] might even contain more characters, like [aAàáâÀÁÂ...] etc.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题