How to make antlr4 fully tokenize terminal nodes

后端 未结 1 1422
孤城傲影
孤城傲影 2021-01-26 06:44

I\'m trying to use Antlr to make a very simple parser, that basically tokenizes a series of .-delimited identifiers.

I\'ve made a simple grammar:

         


        
1条回答
  •  闹比i
    闹比i (楼主)
    2021-01-26 07:18

    Rules starting with a capital letter are Lexer rules.

    With the following input file t.text

    .
    .foobar
    .foobar.baz
    

    your grammar (in file Question.g4) produces the following tokens

    $ grun Question r -tokens -diagnostics t.text
    [@0,0:0='.',,1:0]
    [@1,2:8='.foobar',,2:0]
    [@2,10:20='.foobar.baz',,3:0]
    [@3,22:21='',,4:0]
    

    The lexer (parser) is greedy. It tries to read as many input characters (tokens) as it can with the rule. The lexer rule STRUCTURE_SELECTOR: '.' (ID STRUCTURE_SELECTOR?)? can read a dot, an ID, and again a dot and an ID (due to repetition marker ?), till the NL. That's why each line ends up in a single token.

    When compiling the grammar, the error

    warning(146): Question.g4:5:0: non-fragment lexer rule ID can match the empty string
    

    comes because the repetition marker of ID is * (which means 0 or more times) instead of +(one or more times).

    Now try this grammar :

    grammar Question;
    
    r  
    @init {System.out.println("Question last update 2135");}
        :   ( structure_selector NL )+ EOF
        ;
    
    structure_selector
        :   '.'
        |   '.' ID structure_selector*
        ;
    
    ID  : [_a-z0-9$]+ ;   
    NL  : [\r\n]+ ;          
    WS  : [ \t]+ -> skip ;
    
    $ grun Question r -tokens -diagnostics t.text
    [@0,0:0='.',<'.'>,1:0]
    [@1,1:1='\n',,1:1]
    [@2,2:2='.',<'.'>,2:0]
    [@3,3:8='foobar',,2:1]
    [@4,9:9='\n',,2:7]
    [@5,10:10='.',<'.'>,3:0]
    [@6,11:16='foobar',,3:1]
    [@7,17:17='.',<'.'>,3:7]
    [@8,18:20='baz',,3:8]
    [@9,21:21='\n',,3:11]
    [@10,22:21='',,4:0]
    Question last update 2135
    line 3:7 reportAttemptingFullContext d=1 (structure_selector), input='.'
    line 3:7 reportContextSensitivity d=1 (structure_selector), input='.'
    

    and $ grun Question r -gui t.text displays the hierarchical tree structure you are expecting.

    0 讨论(0)
提交回复
热议问题