问题
I'm designing a music programming language and implementing its syntax as a PEG grammar. The parsing process has ended up being fairly complicated, so what seemed like the simplest approach was to define several, separate grammars, and apply them in sequence. So far I have three grammars:
- Take the entire contents of source file and strip out the comments.
- Take the source file (comments removed) and separate it by instrument. This results in pairs of instrument name/definition and the "music code" to be "played" by said instrument.
- Actually parse music code and return a parse tree of music "events."
Of the three parsers, #3 is by far the most complicated. #1 and #2 are simple by comparison, and take up only about 10 lines each. #3, on the other hand, grows more and more complex the more syntax I implement, and is currently at 33 lines and counting.
The thought occurred me to me that maybe I could condense the 3 grammars into one? This might eliminate a little bit of repetition in the grammars, and may even cut down on the number of lines of code in the program itself, but I'm not sure if it would overcomplicate things too much. I made a cursory stab at combining them, but quickly found it difficult, as I seemingly would have to address the possibility of comments occurring within every single rule (correct me if I'm wrong!). As it is, I already have a rule for optional whitespace which I have included in the definitions for most musical "events" in order to allow for some flexibility with whitespace in the syntax. I can't decide if it makes more sense to stick to parsing in multiple passes and have several, separate parsers, one for each task, or if it would be worthwhile to try and combine them into one super-grammar.
My question is this: for those of you with experience building PEG grammars, do you often find yourself breaking what would be a large grammar into smaller sub-grammars and making multiple passes over your input? Are there any advantages (performance or otherwise) of keeping everything in one grammar?
回答1:
Your approach is sound. Some parser generation tools have provisions for easily ignoring comments and whitespace. If that's not the case with the tool you're using, doing a comment-removal pass is reasonable, as it considerably simplifies the grammars for the other passes.
The only reason I can think of for trying to unify the grammars is that you have a performance requirement, which doesn't seem to be the case.
"Practicality beats purity" (from import this).
来源:https://stackoverflow.com/questions/24739098/is-parsing-in-multiple-passes-common-for-peg-grammars