CHR(0) in REGEXP_LIKE

前端 未结 3 734
被撕碎了的回忆
被撕碎了的回忆 2021-01-15 08:25

I am using the queries to check how chr(0) behaves in regexp_like.

CREATE TABLE t1(a char(10));
INSERT INTO t1 VALUES(\'0123456789\');
SELECT CASE WHEN REGE         


        
相关标签:
3条回答
  • 2021-01-15 08:57

    CHR(0) is the character used to terminate a string in the C programming language (among others).

    When you pass CHR(0) to the function it will, in turn, pass it to lower level function that will parse the strings you have passed in and build a regular expression pattern from that string. This regular expression pattern will see CHR(0) and think it is the string terminator and ignore the rest of the pattern.

    The behaviour is easier to see with REGEXP_REPLACE:

    SELECT REGEXP_REPLACE( 'abc' || CHR(0) || 'e', CHR(0), 'd' )
    FROM   DUAL;
    

    What happens when you run this:

    • CHR(0) is compiled into a regular expression and become a string terminator.
    • Now the pattern is just the string terminator and so the pattern is a zero-length string.
    • The regular expression is then matched against the input string and it reads the first character a and finds a zero-length string can be matched before the a so it replaces the nothing it has matched before the a with an d giving the output da.
    • It will then repeat for the next character transforming b to db.
    • and so on until you reach the end-of-string when it will match the zero-length pattern and append a final d.

    And you will get get the output:

    dadbdcd_ded
    

    (where _ is the CHR(0) character.)

    Note: the CHR(0) in the input is not replaced.

    If the client program you are using is also truncating the string at CHR(0) you may not see the entire output (this is an issue with how your client is representing the string and not with Oracle's output) but it can also be shown using DUMP():

    SELECT DUMP( REGEXP_REPLACE( 'abc' || CHR(0) || 'e', CHR(0), 'd' ) )
    FROM DUAL;
    

    Outputs:

    Typ=1 Len=11: 100,97,100,98,100,99,100,0,100,101,100
    

    [TL;DR] So what is happening with

    REGEXP_LIKE( '1234567890', CHR(0) )
    

    It will make a zero-length string regular expression pattern and it will look for a zero-length match before the 1 character - which it will find and then return that it has found a match.

    0 讨论(0)
  • 2021-01-15 09:18

    Aleksej kind of beat me to it, but CHR(0) is the value for the string terminator (kind of like the NULL keyword but not exactly). Think of it like an internal end-of-string indicator that CHR(0) apparently can see. Note that if you try the query with the keyword NULL, it will return zero, as nothing can be compared to NULL and the comparison thus will fail (as you were expecting). Interesting. Perhaps someone more experienced with the internal workings can explain further, I would be interested to hear more.

    0 讨论(0)
  • 2021-01-15 09:18

    Not an answer, just some experiments, but too long for a comment.

    REGEXP_COUNT seems to be confused by chr(0), counting every character as chr(0); besides, it seems to find one occurrence more than the size of the string.

    SQL> select dump('a'), regexp_count('a', chr(0)) from dual;
    
    DUMP('A')        REGEXP_COUNT('A',CHR(0))
    ---------------- ------------------------
    Typ=96 Len=1: 97                        2
    
    SQL> select dump(chr(0)), regexp_count(chr(0), chr(0)) from dual;
    
    DUMP(CHR(0))   REGEXP_COUNT(CHR(0),CHR(0))
    -------------- ---------------------------
    Typ=1 Len=1: 0                           2
    
    SQL> select dump('0123456789' || chr(0)), regexp_count('0123456789' || chr(0), chr(0)) from dual;
    
    DUMP('0123456789'||CHR(0))                    REGEXP_COUNT('0123456789'||CHR(0),CHR(0))
    --------------------------------------------- -----------------------------------------
    Typ=1 Len=11: 48,49,50,51,52,53,54,55,56,57,0                                        12
    

    LIKE seems to have a good behaviour, while its REGEXP version seems to fail:

    SQL> select 1 from dual where 'a' like '%' || chr(0) || '%';
    
    no rows selected
    
    SQL> select 1 from dual where regexp_like ('a', chr(0));
    
             1
    ----------
             1
    

    Same thing for INSTR and REGEXP_INSTR

    SQL> select 1 from dual where instr('a', chr(0)) != 0;
    
    no rows selected
    
    SQL> select 1 from dual where regexp_instr('a', chr(0)) != 0;
    
             1
    ----------
             1
    

    Tested on 11g XE Release 11.2.0.2.0 - 64bit

    0 讨论(0)
提交回复
热议问题