I\'m reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76):
STRING: \'\"\' (ESC|.)*? \'\"\';
fragment
ESC: \'\\\\\
ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.
The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.)
, the ordering constraint requires it use ESC
if possible.