I am using the queries to check how chr(0) behaves in regexp_like.
CREATE TABLE t1(a char(10));
INSERT INTO t1 VALUES(\'0123456789\');
SELECT CASE WHEN REGE
CHR(0)
is the character used to terminate a string in the C programming language (among others).
When you pass CHR(0)
to the function it will, in turn, pass it to lower level function that will parse the strings you have passed in and build a regular expression pattern from that string. This regular expression pattern will see CHR(0)
and think it is the string terminator and ignore the rest of the pattern.
The behaviour is easier to see with REGEXP_REPLACE
:
SELECT REGEXP_REPLACE( 'abc' || CHR(0) || 'e', CHR(0), 'd' )
FROM DUAL;
What happens when you run this:
CHR(0)
is compiled into a regular expression and become a string terminator.a
and finds a zero-length string can be matched before the a
so it replaces the nothing it has matched before the a
with an d
giving the output da
.b
to db
.d
.And you will get get the output:
dadbdcd_ded
(where _ is the CHR(0)
character.)
Note: the CHR(0)
in the input is not replaced.
If the client program you are using is also truncating the string at CHR(0)
you may not see the entire output (this is an issue with how your client is representing the string and not with Oracle's output) but it can also be shown using DUMP()
:
SELECT DUMP( REGEXP_REPLACE( 'abc' || CHR(0) || 'e', CHR(0), 'd' ) )
FROM DUAL;
Outputs:
Typ=1 Len=11: 100,97,100,98,100,99,100,0,100,101,100
[TL;DR] So what is happening with
REGEXP_LIKE( '1234567890', CHR(0) )
It will make a zero-length string regular expression pattern and it will look for a zero-length match before the 1
character - which it will find and then return that it has found a match.
Aleksej kind of beat me to it, but CHR(0) is the value for the string terminator (kind of like the NULL keyword but not exactly). Think of it like an internal end-of-string indicator that CHR(0) apparently can see. Note that if you try the query with the keyword NULL
, it will return zero, as nothing can be compared to NULL and the comparison thus will fail (as you were expecting). Interesting. Perhaps someone more experienced with the internal workings can explain further, I would be interested to hear more.
Not an answer, just some experiments, but too long for a comment.
REGEXP_COUNT
seems to be confused by chr(0)
, counting every character as chr(0)
; besides, it seems to find one occurrence more than the size of the string.
SQL> select dump('a'), regexp_count('a', chr(0)) from dual;
DUMP('A') REGEXP_COUNT('A',CHR(0))
---------------- ------------------------
Typ=96 Len=1: 97 2
SQL> select dump(chr(0)), regexp_count(chr(0), chr(0)) from dual;
DUMP(CHR(0)) REGEXP_COUNT(CHR(0),CHR(0))
-------------- ---------------------------
Typ=1 Len=1: 0 2
SQL> select dump('0123456789' || chr(0)), regexp_count('0123456789' || chr(0), chr(0)) from dual;
DUMP('0123456789'||CHR(0)) REGEXP_COUNT('0123456789'||CHR(0),CHR(0))
--------------------------------------------- -----------------------------------------
Typ=1 Len=11: 48,49,50,51,52,53,54,55,56,57,0 12
LIKE
seems to have a good behaviour, while its REGEXP version seems to fail:
SQL> select 1 from dual where 'a' like '%' || chr(0) || '%';
no rows selected
SQL> select 1 from dual where regexp_like ('a', chr(0));
1
----------
1
Same thing for INSTR
and REGEXP_INSTR
SQL> select 1 from dual where instr('a', chr(0)) != 0;
no rows selected
SQL> select 1 from dual where regexp_instr('a', chr(0)) != 0;
1
----------
1
Tested on 11g XE Release 11.2.0.2.0 - 64bit