Raku regex: Inconsistent longest token matching

若如初见. 提交于 2021-01-02 05:02:34

问题


Raku's regexes are expected to match longest token.

And in fact, this behaviour is seen in this code:

raku -e "'AA' ~~ m/A {say 1}|AA {say 2}/"
# 2

However, when the text is in a variable, it does not seem to work in the same way:

raku -e "my $a = 'A'; my $b = 'AA'; 'AA' ~~ m/$a {say 1}|$b {say 2}/"
# 1

Why they work in a different way? Is there a way to use variables and still match the longest token?


回答1:


There are two things at work here.

The first is the meaning of "longest token". When there is an alternation (using | or implied by use of proto regexes), the declarative prefix of each branch is extracted. Declarative means the subset of the Raku regex language that can be matched by a finite state machine. The declarative prefix is determined by taking regex elements until a non-declarative element is encountered. You can read more and find some further references in the docs.

To understand why things are this way, a small detour may be helpful. A common approach to building parsers is to write a tokenizer, which breaks the input text up into a sequence of "tokens", and then a parser that identifies larger (and perhaps recursive) structure from those tokens. Tokenizing is typically performed using a finite state machine, since it is able to rapidly cut down the search space. With Raku grammars, we don't write the tokenizer ourselves; instead, it's automatically extracted from the grammar for us (more precisely, a tokenizer is calculated per alternation point).

Secondly, Raku regexes are a nested language within the main Raku language, parsed in a single pass with it and compiled at the same time. (This is a departure from most languages, where regexes are provided as a library that we pass strings to.) The longest token calculation takes place at compile time. However, variables are interpolated at runtime. Therefore, a variable interpolation in a regex is non-declarative, and therefore is not considered as part of the longest token matching.



来源:https://stackoverflow.com/questions/64407663/raku-regex-inconsistent-longest-token-matching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!