I\'m using PLY. Here is one of my states from parser.out:
state 3
(5) course_data -> course .
(6) course_data -> course . course_
Your basic problem is that you need two tokens of lookahead to do what you want -- when the input seen so far is a course
and the lookahead is a OR_CONJ
you don't know whether to reduce the course
to a course_data
or shift without looking ahead two tokens to the token after the OR_CONJ
. There are a number of ways you can deal with this
use an LR(2) or LR(k) or GLR parser generator -- any can deal with this.
use a lexer hack to do the lookahead -- basically have the lexer return two different OR_CONJ
tokens depending on whether the following token is a COURSE_NUMBER
or not.
factor the grammar to get rid of the conflict, which may result in a grammar that parses something slightly different from what you want (need some extra post-parse checks to reject some invalid constructs) and will generally make the grammar much harder to understand.
Note that your grammar as given is also ambiguous related to which way three or more courses connected in a single statement associate. This is easily fixed by rewriting the grammar into a clearer left-recursive form:
Rule 1 statement -> course
Rule 2 statement -> statement OR_CONJ course
Rule 3 course -> DEPT_CODE course_list
Rule 4 course -> DEPT CODE course_list OR_CONJ COURSE_NUMBER
Rule 5 course_list -> COURSE_NUMBER
Rule 6 course_list -> course_list , COURSE_NUMBER
This could also be rewritten as right-recursive for an LL parser generator, but it still has the 2-token lookahead problem. One way of refactoring it to make that go away would be to make COURSE_NUMBER
by itself a valid course
and recombine it with the previous course
in a post-pass (or give an error if its the first course
in a statement
). Then rule 4 becomes:
Rule 4 course -> COURSE_NUMBER
and you have no conflicts.