Parsing string interpolation in ANTLR

后端 未结 1 1401
再見小時候
再見小時候 2020-12-29 17:15

I\'m working on a simple string manipulation DSL for internal purposes, and I would like the language to support string interpolation as it is used in Ruby.

For exam

相关标签:
1条回答
  • I'm no ANTLR expert, but here's a possible grammar:

    grammar Str;
    
    parse
        :    ((Space)* statement (Space)* ';')+ (Space)* EOF
        ;
    
    statement
        :    print | assignment
        ;
    
    print
        :    'print' '(' (Identifier | stringLiteral) ')' 
        ;
    
    assignment
        :    Identifier (Space)* '=' (Space)* stringLiteral
        ;
    
    stringLiteral
        :    '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"'
        ;
    
    Interpolation
        :    '${' Identifier '}'
        ;
    
    Identifier
        :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
        ;
    
    EscapeSequence
        :    '\\' SpecialChar
        ;
    
    SpecialChar
        :     '"' | '\\' | '$'
        ;
    
    Space
        :    (' ' | '\t' | '\r' | '\n')
        ;
    
    NormalChar
        :    ~SpecialChar
        ;
    

    As you notice, there are a couple of (Space)*-es inside the example grammar. This is because the stringLiteral is a parser-rule instead of a lexer-rule. Therefor, when tokenizing the source file, the lexer cannot know if a white space is part of a string literal, or is just a space inside the source file that can be ignored.

    I tested the example with a little Java class and all worked as expected:

    /* the same grammar, but now with a bit of Java code in it */
    grammar Str;
    
    @parser::header {
        package antlrdemo;
        import java.util.HashMap;
    }
    
    @lexer::header {
        package antlrdemo;
    }
    
    @parser::members {
        HashMap<String, String> vars = new HashMap<String, String>();
    }
    
    parse
        :    ((Space)* statement (Space)* ';')+ (Space)* EOF
        ;
    
    statement
        :    print | assignment
        ;
    
    print
        :    'print' '(' 
             (    id=Identifier    {System.out.println("> "+vars.get($id.text));} 
             |    st=stringLiteral {System.out.println("> "+$st.value);}
             ) 
             ')' 
        ;
    
    assignment
        :    id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);}
        ;
    
    stringLiteral returns [String value]
        :    '"'
            {StringBuilder b = new StringBuilder();} 
            (    id=Identifier           {b.append($id.text);}
            |    es=EscapeSequence       {b.append($es.text);}
            |    ch=(NormalChar | Space) {b.append($ch.text);}
            |    in=Interpolation        {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));}
            )* 
            '"'
            {$value = b.toString();}
        ;
    
    Interpolation
        :    '${' i=Identifier '}'
        ;
    
    Identifier
        :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
        ;
    
    EscapeSequence
        :    '\\' SpecialChar
        ;
    
    SpecialChar
        :     '"' | '\\' | '$'
        ;
    
    Space
        :    (' ' | '\t' | '\r' | '\n')
        ;
    
    NormalChar
        :    ~SpecialChar
        ;
    

    And a class with a main method to test it all:

    package antlrdemo;
    
    import org.antlr.runtime.*;
    
    public class ANTLRDemo {
        public static void main(String[] args) throws RecognitionException {
            String source = "name = \"Bob\";        \n"+
                    "msg = \"Hello ${name}\";       \n"+
                    "print(msg);                    \n"+
                    "print(\"Bye \\${for} now!\");    ";
            ANTLRStringStream in = new ANTLRStringStream(source);
            StrLexer lexer = new StrLexer(in);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            StrParser parser = new StrParser(tokens);
            parser.parse();
        }
    }
    

    which produces the following output:

    > Hello Bob
    > Bye \${for} now!
    

    Again, I am no expert, but this (at least) gives you a way to solve it.

    HTH.

    0 讨论(0)
提交回复
热议问题