antlr grammar avoiding angle brackets

北城以北 提交于 2019-12-12 03:34:09

问题


In this question I asked about extracting tags from arbitrary text. The solution provided worked well, but there's one edge case I'd like to handle. To recap, I'm parsing arbitrary user-entered text and would like to have any occurrence of < or > to conform to valid tag syntax. Where an angle bracket isn't part of a valid tag, it should be escaped as &lt; or &gt;. The syntax I'm looking for is <foo#123> where foo is text from a fixed list of entries and 123 is a number [0-9]+. The parser:

parser grammar TagsParser;

options {
    tokenVocab = TagsLexer;
}

parse: (tag | text)* EOF;
tag: LANGLE fixedlist GRIDLET ID RANGLE;
text: NOANGLE;
fixedlist: FOO | BAR | BAZ;

The lexer:

lexer grammar TagsLexer;

LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;

mode tag:

RANGLE: '>' -> popMode;
GRIDLET: '#';
FOO: 'foo';
BAR: 'bar';
BAZ: 'baz';
ID: [0-9]+;
OTHERTEXT: . ;

This works well and successfully parses text such as:

<foo#123>
Hi <bar#987>!
<baz#1><foo#2>anythinghere<baz#3>
if 1 &lt; 2

It also successfully fails the following when I use the BailErrorStrategy:

<foo123>
<bar#a>
<foo#123H>
<unsupported#123>
if 1 < 2

The last one successfully fails because < enters the tag mode and it doesn't match a supported tag format. However, I would also like to avoid instances of > in the text as well, so the following should fail as well:

if 2 > 1

That text should be specified as if 2 &gt; 1 instead of having the raw angle bracket.

How can I modify the grammar so that occurrences of > which aren't part of a valid tag fail to parse?


回答1:


As your grammar stands now, it will fail > outside of a tag with token recognition error, because > doesn't appear in the lexer grammar outside of the tag mode. That's a failure all right as it is. But if you insist on failing during parse, then just add right angle to the lexer's default mode:

lexer grammar TagsLexer;

LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;
BADRANGLE: '>';

mode tag;

RANGLE: '>' -> popMode;
...

Then > outside of a tag will fail during parse.



来源:https://stackoverflow.com/questions/39278948/antlr-grammar-avoiding-angle-brackets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!