When answering this question, I wrote this code to iterate over the UTF-8 byte sequence in a string:
local str = \"KORYTNAČKA\"
for c in str:gmatch(\"[\\0-\\
See the Lua 5.1 manual on patterns.
A pattern cannot contain embedded zeros. Use %z instead.
In Lua 5.2, this was changed so that you could use \0
instead, but not so for 5.1. Simply add %z
to the first set and change the first range to \1-\127
.
I highly suspect, this happens because of \0
in the pattern. Basically, string that holds your pattern null-terminates before it should and, in fact, what lua regex engine is parsing is: [\0
. That's clearly wrong pattern and should trigger the error you're currently getting.
To prove this concept I made little change to pattern:
local str = "KORYTNAČKA"
for c in str:gmatch("[\x0-\x7F\xC2-\xF4][\x80-\xBF]*") do
print(c)
end
That compiled and ran as expected on lua 5.1.4. Demonstration
Note: I have not actually looked what pattern was doing. Just removed \0
by adding x
. So output of modified code might not be what you expect.
Edit: As a workaround you might consider replacing \0
with \\0
(to escape null-termination) in your second code example:
local str = "KORYTNAČKA"
for c in str:gmatch("[\\0-\127\194-\244][\128-\191]*") do
print(c)
end
Demo