问题
Raku's regexes are expected to match longest token.
And in fact, this behaviour is seen in this code:
raku -e "'AA' ~~ m/A {say 1}|AA {say 2}/"
# 2
However, when the text is in a variable, it does not seem to work in the same way:
raku -e "my $a = 'A'; my $b = 'AA'; 'AA' ~~ m/$a {say 1}|$b {say 2}/"
# 1
Why they work in a different way? Is there a way to use variables and still match the longest token?
回答1:
There are two things at work here.
The first is the meaning of "longest token". When there is an alternation (using |
or implied by use of proto
regexes), the declarative prefix of each branch is extracted. Declarative means the subset of the Raku regex language that can be matched by a finite state machine. The declarative prefix is determined by taking regex elements until a non-declarative element is encountered. You can read more and find some further references in the docs.
To understand why things are this way, a small detour may be helpful. A common approach to building parsers is to write a tokenizer, which breaks the input text up into a sequence of "tokens", and then a parser that identifies larger (and perhaps recursive) structure from those tokens. Tokenizing is typically performed using a finite state machine, since it is able to rapidly cut down the search space. With Raku grammars, we don't write the tokenizer ourselves; instead, it's automatically extracted from the grammar for us (more precisely, a tokenizer is calculated per alternation point).
Secondly, Raku regexes are a nested language within the main Raku language, parsed in a single pass with it and compiled at the same time. (This is a departure from most languages, where regexes are provided as a library that we pass strings to.) The longest token calculation takes place at compile time. However, variables are interpolated at runtime. Therefore, a variable interpolation in a regex is non-declarative, and therefore is not considered as part of the longest token matching.
来源:https://stackoverflow.com/questions/64407663/raku-regex-inconsistent-longest-token-matching