Parsing latex-like language in Java

前端 未结 1 581
暖寄归人
暖寄归人 2021-01-03 11:38

I\'m trying to write a parser in Java for a simple language similar to Latex, i.e. it contains lots of unstructured text with a couple of \\commands[with]{some}{parameters}

相关标签:
1条回答
  • 2021-01-03 11:48

    You can define a grammar to accept the Latex input, using just characters as tokens in the worst cast. JavaCC should be just fine for this purpose.

    The good thing about a grammar and a parser generator is that it can parse things that FSAs have trouble with, especially nested structures.

    A first cut at your grammar could be (I'm not sure this is valid JavaCC, but it is reasonable EBNF):

     Latex = item* ;
     item = command | rawtext ;
     command =  command arguments ;
     command = '\' letter ( letter | digit )* ;  -- might pick this up as lexeme
     letter = 'a' | 'b' | ... | 'z' ;
     digit= '0' | ...  | '9' ;
     arguments =  epsilon |  '{' item* '}' ;
     rawtext = ( letter | digit | whitespace | punctuationminusbackslash )+ ; -- might pick this up as lexeme
     whitespace = ' ' | '\t' | '\n' | '\:0D' ; 
     punctuationminusbackslash = '!' | ... | '^' ;
    
    0 讨论(0)
提交回复
热议问题