问题
I want to have a grammar that is lax in whether whitespace is present or not... I want to match:
this ' <foo> <bar> <baz> '
and also this '<foo><bar><baz>'
This works:
token TOP { \s* <foo> \s* <bar> \s* <baz> \s* }
But after reading all about :sigspace, <.ws> and rule I can imagine that there is a way to do this without the repeated *\s . (viz. How do I match a hex array in per6 grammar)
Please can someone tell me if there is nicer way to do this in a perl6 grammar?
NB. this is not solved by simply changing the token declarator to rule - when I try that approach I end up either matching space or no space (but not both) in the parse string.
回答1:
Perhaps your problem is one these three rule "gotchyas":
If you want white space / token boundary matching at the start of a rule, before the first atom, you must explicitly provide it (typically with an explicit
<.ws>
).If you want white space / token boundary matching between each of the matches of a quantified atom (eg
<foo>*
) you must include space between the atom and the quantifier (eg<foo> *
).The default
<ws>
is defined asregex ws { <!ww> \s* }
. If you wantrule
s in a particular grammar to use a different pattern, then define your own in that grammar. (timotimo++)
For further discussion of the above, see my updated answer to How do I match a hex array in per6 grammar.
The following four regexes match both your sample strings:
my \test-strings := ' <foo> <bar> <baz> ', '<foo><bar><baz>';
my \test-regexes := token { \s* '<foo>' \s* '<bar>' \s* '<baz>' \s* },
rule { \s* '<foo>' \s* '<bar>' \s* '<baz>' \s* },
rule { \s* '<foo>' '<bar>' '<baz>' },
rule { <.ws> '<foo>' '<bar>' '<baz>' }
say (test-strings X~~ test-regexes).all ~~ Match # True
来源:https://stackoverflow.com/questions/56507066/whats-the-best-way-to-be-lax-on-whitespace-in-a-perl6-grammar