I have a grammar which looks like this consisting of comment and control statements of a particular language:
Grammar:
grammar DD;
1) The white space rule generates a lot of tokens :
$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.6-complete.jar
$ alias grun
alias grun='java org.antlr.v4.gui.TestRig'
$ grun DD ddlist -tokens jcl.txt
[@6,11:12='DD',<'DD'>,1:11]
[@7,13:13=' ',,channel=1,1:13]
[@8,14:14=' ',,channel=1,1:14]
[@9,15:15='*',<'*'>,1:15]
[@10,16:16=' ',,channel=1,1:16]
[@11,17:17=' ',,channel=1,1:17]
[@12,18:18=' ',,channel=1,1:18]
[@13,19:19=' ',,channel=1,1:19]
[@14,20:20=' ',,channel=1,1:20]
[@15,21:21=' ',,channel=1,1:21]
[@16,22:22=' ',,channel=1,1:22]
You can consume all consecutive spaces in a single token with the +
modifier (= one or more) :
WS : [ \t\r\n]+ -> channel(HIDDEN);
[@3,11:12='DD',<'DD'>,1:11]
[@4,13:14=' ',,channel=1,1:13]
[@5,15:15='*',<'*'>,1:15]
[@6,16:54=' \n',,channel=1,1:16]
[@7,55:58='SORT',,2:0]
.
2) You don't mention an important error message that occurs when compiling the grammar :
warning(125): DD.g4:12:12: implicit definition of token INLINEDATA in parser
Using an undefined token in the parser is as if you had a lexer rule :
INLINEDATA : 'INLINEDATA' ;
that is a string constant. Thus the parser rule
dd4: JCLBEGIN ddname DDWORD '*' inlinerec INLINESTMTEND?;
means : I expect the input stream to be :
//{a name} DD * 'INLINEDATA'
but the input is :
//SYSIN DD * SORT
hence the message
line 2:0 mismatched input 'SORT' expecting INLINEDATA
.
3) My grammar for this kind of job control statement :
grammar JCL;
/* Parsing JCL, ignoring inline sysin. */
jcl
: jcl_card+ // good old punched cards :-)
;
jcl_card
: dd_statement
| COMMENT
;
dd_statement
: '//' NAME 'DD' file_type ( NL | EOF )
;
file_type
: 'DUMMY'
| 'DYNAM'
| NAME '=' ( '*' | NAME )
| '*' NL inline_sysin
;
inline_sysin
: NON_JCL_CARD* END_OF_FILE
;
NAME : [A-Z#] ( LETTER | DIGIT | SPECIAL_CHARS )* ;
COMMENT : '//*' .*? ( NL | EOF ) ;
END_OF_FILE : '/' {getCharPositionInLine() == 1}? '*' ;
NON_JCL_CARD : ~'/' {getCharPositionInLine() == 1}? .*? ( NL | EOF ) ;
STRING : '\'' .*? '\'' | '"' .*? '"' ;
NL : [\r\n] ;
WS : [ \t]+ -> skip ; // or -> channel(HIDDEN) to keep white space tokens
fragment DIGIT : [0-9] ;
fragment LETTER : [A-Z] ;
fragment SPECIAL_CHARS : '#' | '@' | '$' ;
With the input
//SYSIN DD *
SORT FIELDS=COPY
INCLUDE COND
any other program input @ $ ! & %
/*
//SYSPRINT DD SYSOUT=*
//* Comment line #1
//* Comment line #2
//SYSOUT DD SYSOUT=*
//SYSOUT DD DUMMY
//SYSIN DD *
/* not end of input
/*
it gives
$ grun JCL jcl -tokens jcl.txt
[@0,0:1='//',<'//'>,1:0]
[@1,2:6='SYSIN',,1:2]
[@2,11:12='DD',<'DD'>,1:11]
[@3,15:15='*',<'*'>,1:15]
[@4,21:21='\n',,1:21]
[@5,22:38='SORT FIELDS=COPY\n',,2:0]
[@6,39:51='INCLUDE COND\n',,3:0]
[@7,52:85='any other program input @ $ ! & %\n',,4:0]
[@8,86:87='/*',,5:0]
[@9,106:106='\n',,5:20]
...
@17,131:161='//* Comment line #1 \n',,7:0]
...
[@31,232:233='//',<'//'>,11:0]
[@32,234:238='SYSIN',,11:2]
[@33,243:244='DD',<'DD'>,11:11]
[@34,247:247='*',<'*'>,11:15]
[@35,253:253='\n',,11:21]
[@36,254:278=' /* not end of input \n',,12:0]
[@37,279:280='/*',,13:0]
[@38,281:280='',,13:2]
.
Give a try to the -gui option to display the parse tree :
$ grun JCL jcl -gui jcl.txt