问题
I have this EBNF grammar for the Jass scripting language.
What needs to be done to convert it to work with ANTLR 3.5?
Furthermore, are there any sort of tools available to aid me in doing so?
//----------------------------------------------------------------------
// Global Declarations
//----------------------------------------------------------------------
program ::= file+
file ::= newline? ( declr newline )* func*
declr ::= typedef
| globals
| native_func
typedef ::= 'type' id 'extends' ( 'handle' | id )
globals ::= 'globals' newline global_var_list 'endglobals'
global_var_list
::= ( 'constant' type id '=' expr newline | var_declr newline )*
native_func
::= 'constant'? 'native' func_declr
func_declr
::= id 'takes' ( 'nothing' | param_list ) 'returns' ( type | 'nothing' )
param_list
::= type id ( ',' type id )*
func ::= 'constant'? 'function' func_declr newline local_var_list statement_list 'endfunction' newline
//----------------------------------------------------------------------
// Local Declarations
//----------------------------------------------------------------------
local_var_list
::= ( 'local' var_declr newline )*
var_declr
::= type id ( '=' expr )?
| type 'array' id
statement_list
::= ( statement newline )*
statement
::= set
| call
| ifthenelse
| loop
| exitwhen
| return
| debug
set ::= 'set' id '=' expr
| 'set' id '[' expr ']' '=' expr
call ::= 'call' id '(' args? ')'
args ::= expr ( ',' expr )*
ifthenelse
::= 'if' expr 'then' newline statement_list else_clause? 'endif'
else_clause
::= 'else' newline statement_list
| 'elseif' expr 'then' newline statement_list else_clause?
loop ::= 'loop' newline statement_list 'endloop'
exitwhen ::= 'exitwhen' expr
return ::= 'return' expr?
debug ::= 'debug' ( set | call | ifthenelse | loop )
//----------------------------------------------------------------------
// Expressions
//----------------------------------------------------------------------
expr ::= binary_op
| unary_op
| func_call
| array_ref
| func_ref
| id
| const
| parens
binary_op
::= expr ( [+-*/><] | '==' | '!=' | '>=' | '<=' | 'and' | 'or' ) expr
unary_op ::= ( '+' | '-' | 'not' ) expr
func_call
::= id '(' args? ')'
array_ref
::= id '[' expr ']'
func_ref ::= 'function' id
const ::= int_const
| real_const
| bool_const
| string_const
| 'null'
int_const
::= decimal
| octal
| hex
| fourcc
decimal ::= [1-9] [0-9]*
octal ::= '0' [0-7]*
hex ::= '$' [0-9a-fA-F]+
| '0' [xX] [0-9a-fA-F]+
fourcc ::= '' ' .{4} ' ''
real_const
::= [0-9]+ '.' [0-9]*
| '.' [0-9]+
bool_const
::= 'true'
| 'false'
string_const
::= '"' .* '"'
parens ::= '(' expr ')'
//----------------------------------------------------------------------
// Base RegEx
//----------------------------------------------------------------------
type ::= id
| 'code'
| 'handle'
| 'integer'
| 'real'
| 'boolean'
| 'string'
id ::= [a-zA-Z] ( [a-zA-Z0-9_]* [a-zA-Z0-9] )?
newline ::= '\n'+
Thanks in advance to any advice offered!
回答1:
Disclaimer: I don't actually use ANTLR, so someone that does might come by with more detailed information.
ANTLR generates recursive descent parsers, so your grammar will have to be refactored to eliminate left recursion, which you have e.g. in expr
:
expr ::= binary_op
...
binary_op
::= expr ( [+-*/><] | '==' | '!=' | '>=' | '<=' | 'and' | 'or' ) expr
While parsing expr
, the parser would try binary_op
as an option, encounter another expr
, then try to parse that recursively without having consumed any input, and you would now have infinite recursion.
This is usually dealt with by reformulating the grammar along the lines of
expr ::= binary_op
...
binary_op
::= term ( [+-] term )
term = factor ( [*/] factor)
factor = id
| const
| parens
...
and so on.
Not a trivial process, but not impossible to do either.
回答2:
You asked for any advice, yet your question is strangely specific to Antlr 3.5. Do you have a requirement to use Antlr 3.5? It would help to know what you will be using the grammar for: simple syntax validation or a full-on interpreter?
If you can consider using Antlr 4, you should. It handles left factored rules better than Antlr 3 and, since it appears that you are just learning Antlr, Antlr 4 IMO will be easier to pick up. If you really need an AST, then go with Antlr 3
Unfortunately, an automated conversion tool would, at best, give you a bad starting point for developing your grammar.
As to where/how to start, best advise would be to get a copy of the Java grammar (java.g for Antlr 3.5 or java.g4 for Antlr 4) to use as a working example -- Jess appears to sufficiently similar that the java grammar should give you a clear idea of how to proceed.
回答3:
Grammar description languages are really small. The grammars for them have only a dozen or so rules.
What you could do (something I have done) is use ANTLR to write a grammar for the EBNF notation, and use that to to translate what you have into an ANTLR grammar.
It should be about a day of work, or two at most.
来源:https://stackoverflow.com/questions/15148369/ebnf-grammar-to-antlr3