Equivalent pattern to “[\0-\x7F\xC2-\xF4][\x80-\xBF]*” in Lua 5.1

前端 未结 2 2007
暖寄归人
暖寄归人 2020-12-21 06:06

When answering this question, I wrote this code to iterate over the UTF-8 byte sequence in a string:

local str = \"KORYTNAČKA\"
for c in str:gmatch(\"[\\0-\\         


        
相关标签:
2条回答
  • 2020-12-21 06:29

    See the Lua 5.1 manual on patterns.

    A pattern cannot contain embedded zeros. Use %z instead.
    

    In Lua 5.2, this was changed so that you could use \0 instead, but not so for 5.1. Simply add %z to the first set and change the first range to \1-\127.

    0 讨论(0)
  • 2020-12-21 06:45

    I highly suspect, this happens because of \0 in the pattern. Basically, string that holds your pattern null-terminates before it should and, in fact, what lua regex engine is parsing is: [\0. That's clearly wrong pattern and should trigger the error you're currently getting.

    To prove this concept I made little change to pattern:

    local str = "KORYTNAČKA"
    for c in str:gmatch("[\x0-\x7F\xC2-\xF4][\x80-\xBF]*") do 
        print(c) 
    end
    

    That compiled and ran as expected on lua 5.1.4. Demonstration

    Note: I have not actually looked what pattern was doing. Just removed \0 by adding x. So output of modified code might not be what you expect.

    Edit: As a workaround you might consider replacing \0 with \\0 (to escape null-termination) in your second code example:

    local str = "KORYTNAČKA"
    for c in str:gmatch("[\\0-\127\194-\244][\128-\191]*") do 
        print(c) 
    end
    

    Demo

    0 讨论(0)
提交回复
热议问题