Antlr greedy-option

早过忘川 提交于 2019-12-13 17:21:47

问题


(I edited my question based on the first comment of @Bart Kiers - thank you!)

I have the following grammar:

SPACE : (' '|'\t'|'\n'|'\r')+ {$channel = HIDDEN;};
START : 'START:';
STRING_LITERAL  : ('"' .* '"')+;
rule    :  START STRING_LITERAL;

and I want to parse languages like: 'START: "abcd" START: "img src="test.jpg""' (string literals could be inside string literals).
The grammar defined above does not work if there are string literals inside a string literal because for the language 'START: "img src="test.jpg""' the lexer translates it into the following tokens: START('START:') STRING_LITERAL("img src=") test.jpg.
Is there any way to define a grammar which is fine for my problem?


回答1:


There are a couple of things wrong here:

  • you cannot use fragment rules inside parser rules. You grammar will never create a START token;
  • a . char (DOT-char) inside a parser rule matches any token, while inside a lexer rule, it matches any character;
  • if you let .* match greedily (and you had defined a proper lexer rule that matches a string literal), the input START: "abcd" START: "img src="test.jpg"" would then have one large string in it: "abcd" START: "img src="test.jpg"" (the first and the last quote would be matched).

So, you cannot embed string literals inside string literals using the same quotes. The lexer is not able to determine if a quote is meant to close the string, or if it's the start of a (new) embedded string. You will need to change that in your grammar.



来源:https://stackoverflow.com/questions/10002164/antlr-greedy-option

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!