can an element contain attribute as parsed by parser generated by ANTLR? if so, how?

前端 未结 3 1796
野性不改
野性不改 2021-01-26 17:42

I am following this tutorial and successfully replicated its behavior except that I am using Antlr 4.7 instead of the 4.5 that the tutorial was using.

I am trying to bui

相关标签:
3条回答
  • 2021-01-26 18:19

    I am guessing I need to change the todo.g4 and then re generate the parser.

    Of course regenerate after each change. For me it's :

    $ a4 Question.g4
    $ javac Q*.java
    $ grun Question elements -tokens -diagnostics t.text
    

    where

    $ alias
    alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
    alias grun='java org.antlr.v4.gui.TestRig'
    

    The more you describe specific contents, the more you may face ambiguity problems. For example, you have two rules :

    payment   : 'pay' [payee] [amount]
    free_text : ... any character ...
    

    Consider the following content :

    * pay Federico Tomassetti 10 € for the tutorial
    

    * pay Federico Tomassetti 10 is ambiguous and can be matched by the two rules, but it will finally be parsed as free text, because of € for the tutorial which doesn't satisfy payment.

    If later you change the payment rule to accept more info after the amount :

    payment   : 'pay' [payee] [amount] payment_info
    

    the above content will be matched by payment (in case of ambiguity ANTLR chooses the first rule). The good news is that ANTLR 4 is very strong to disambiguate, it reads the whole file if necessary.

    For ambiguous tokens and precedence rules, read the posts of these last three weeks, a lot have been said.

    Mixing Raven's grammar with yours, this is one possible solution :

    File Question.g4

    grammar Question;
    
    elements
    @init {System.out.println("Question last update 1432");}
        : ( element | emptyLine )* EOF
        ;
    
    element
        : '*' content NL
        ;
    
    content
        : payment   //{System.out.println("Payement found " + $payment.text);}
        | free_text {System.out.println("Free text found " + $free_text.text);}
        ;
    
    payment
        : PAY receiver amount=NUMBER
          {System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
        ;
    
    receiver
        : surname=WORD ( lastname=WORD )?
        ;  
    
    free_text
        : ( WORD | PAY | NUMBER )+
        ;
    
    emptyLine
        : NL
        ;
    
    PAY    : 'pay' ;
    WORD   : LETTER ( LETTER | DIGIT | '_' )* ;
    NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;  
    
    NL  : [\r\n]
        | '\r\n' 
        ;
    //WS  : [ \t]+ -> skip ; // $payment.text => payAcmeCorp123,789.45
    WS  : [ \t]+ -> channel(HIDDEN) ; // spaces are needed to nicely display $payment.text
    
    fragment DIGIT  : [0-9] ;
    fragment LETTER : [a-zA-Z] ;
    

    File t.text

    * play with ANTLR 4
    * write a tutorial
    * pay Acme Corp 123,789.45
    * pay Banana Inc 700
    * pay Federico Tomassetti 10 € for the tutorial
    

    Execution :

    $ grun Question elements -tokens -diagnostics t.text
    line 5:29 token recognition error at: '€'
    [@0,0:0='*',<'*'>,1:0]
    [@1,1:1=' ',<WS>,channel=1,1:1]
    [@2,2:5='play',<WORD>,1:2]
    [@3,6:6=' ',<WS>,channel=1,1:6]
    [@4,7:10='with',<WORD>,1:7]
    [@5,11:11=' ',<WS>,channel=1,1:11]
    [@6,12:16='ANTLR',<WORD>,1:12]
    [@7,17:17=' ',<WS>,channel=1,1:17]
    [@8,18:18='4',<NUMBER>,1:18]
    [@9,19:19='\n',<NL>,1:19]
    [@10,20:20='*',<'*'>,2:0]
    [@11,21:21=' ',<WS>,channel=1,2:1]
    [@12,22:26='write',<WORD>,2:2]
    [@13,27:27=' ',<WS>,channel=1,2:7]
    [@14,28:28='a',<WORD>,2:8]
    [@15,29:29=' ',<WS>,channel=1,2:9]
    [@16,30:37='tutorial',<WORD>,2:10]
    [@17,38:38='\n',<NL>,2:18]
    ...
    [@56,136:135='<EOF>',<EOF>,7:0]
    Question last update 1432
    Free text found play with ANTLR 4
    Free text found write a tutorial
    line 3:26 reportAttemptingFullContext d=2 (content), input='pay Acme Corp 123,789.45
    '
    ...
    Payement found 700 to Banana Inc
    Free text found pay Federico Tomassetti 10  for the tutorial
    

    As you can see, the € symbol is not recognized. You may need a CONTENT rule similar to FIELDTEXT here, and then you get into trouble ...

    Federico's Mega tutorial is a good start. For nitty-gritty details, see The Definitive ANTLR 4 Reference or the online doc from www.antlr.org.

    0 讨论(0)
  • 2021-01-26 18:21

    Situation on October 24. 2017 at 19:00 UTC+1.

    Your grammar works perfectly. I made a full test in Java.

    File Expense.g4 :

    grammar Expense;
    
    payments
    @init {System.out.println("Expense last update 1853");}
        : (payment NL)*
        ;
    
    payment
        : PAY receiver amount=NUMBER
          {System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
        ;
    
    receiver
        : surname=ID (lastname=ID)?
        ; 
    
    PAY    : 'pay' ;
    NUMBER : ([0-9]+(','[0-9]+)*)('.'[0-9]*)? ;
    ID     : [a-zA-Z0-9_]+ ;
    NL     : '\n' | '\r\n' ;  
    WS     : [\t ]+ -> channel(HIDDEN) ; // keep the spaces (witout spaces ==> paydeltaco98)
    

    File ExpenseMyListener.java :

    public class ExpenseMyListener extends ExpenseBaseListener {
        ExpenseParser parser;
        public ExpenseMyListener(ExpenseParser parser) { this.parser = parser; }
    
        public void exitPayments(ExpenseParser.PaymentsContext ctx) {
            System.out.println(">>> in ExpenseMyListener for paymentsss");
            System.out.println(">>> there are " + ctx.payment().size() + " elements in the list of payments");
            for (int i = 0; i < ctx.payment().size(); i++) {
                System.out.println(ctx.payment(i).getText());
            }
        }
    
        public void exitPayment(ExpenseParser.PaymentContext ctx) {
            System.out.println(">>> in ExpenseMyListener for payment");
            System.out.println(parser.getTokenStream().getText(ctx));
        }
    }
    

    File test_expense.java :

    import org.antlr.v4.runtime.ANTLRFileStream;
    import org.antlr.v4.runtime.ANTLRInputStream;
    import org.antlr.v4.runtime.CommonTokenStream;
    import org.antlr.v4.runtime.ParserRuleContext;
    import org.antlr.v4.runtime.tree.*;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import java.io.IOException;
    
    public class test_expense {
        public static void main(String[] args) throws IOException {
            ANTLRInputStream input = new ANTLRFileStream(args[0]);
            ExpenseLexer lexer = new ExpenseLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            ExpenseParser parser = new ExpenseParser(tokens);
            ParseTree tree = parser.payments();
            System.out.println("---parsing ended");
            ParseTreeWalker walker = new ParseTreeWalker();
            ExpenseMyListener my_listener = new ExpenseMyListener(parser);
            System.out.println(">>>> about to walk");
            walker.walk(my_listener, tree);
        }
    }
    

    Input file top.text :

    pay Acme Corp 123,456
    pay Banana Inc 456789.00
    pay charlie pte 123,456.89
    pay delta co 98
    

    Execution :

    $ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
    $ alias
    alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
    alias grun='java org.antlr.v4.gui.TestRig'
    $ a4 Expense.g4 
    $ javac Ex*.java
    $ javac test_expense.java 
    $ grun Expense payments -tokens -diagnostics top.text
    [@0,0:2='pay',<'pay'>,1:0]
    [@1,3:3=' ',<WS>,channel=1,1:3]
    [@2,4:7='Acme',<ID>,1:4]
    [@3,8:8=' ',<WS>,channel=1,1:8]
    [@4,9:12='Corp',<ID>,1:9]
    ...
    [@32,90:89='<EOF>',<EOF>,5:0]
    Expense last update 1853
    Payement found 123,456 to Acme Corp
    Payement found 456789.00 to Banana Inc
    Payement found 123,456.89 to charlie pte
    Payement found 98 to delta co
    
    $ java test_expense top.text 
    Expense last update 1853
    Payement found 123,456 to Acme Corp
    Payement found 456789.00 to Banana Inc
    Payement found 123,456.89 to charlie pte
    Payement found 98 to delta co
    ---parsing ended
    >>>> about to walk
    >>> in ExpenseMyListener for payment
    pay Acme Corp 123,456
    >>> in ExpenseMyListener for payment
    pay Banana Inc 456789.00
    >>> in ExpenseMyListener for payment
    pay charlie pte 123,456.89
    >>> in ExpenseMyListener for payment
    pay delta co 98
    >>> in ExpenseMyListener for paymentsss
    >>> there are 4 elements in the list of payments
    payAcmeCorp123,456
    payBananaInc456789.00
    paycharliepte123,456.89
    paydeltaco98
    
    0 讨论(0)
  • 2021-01-26 18:21

    I'm not entirely sure what exactly you want but for the provided examples this grammar should do the job:

    payments: (payment NL)* ;  
    payment: PAY receiver amount=NUMBER ;  
    receiver: surname=ID (lastname=ID)? ;  
    
    PAY: 'pay' ;
    NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;  
    ID: [a-zA-Z0-9_]+ ;
    NL: '\n' | '\r\n' ;  
    WS: [\t ]+ -> skip ;
    

    If this is what you were asking for I will add some more explanation if needed...

    0 讨论(0)
提交回复
热议问题