问题
Im trying to match this syntax:
pgm ::= exprs
exprs ::= expr [; exprs]
expr ::= ID | expr . [0-9]+
My scala packrat parser combinator looks like this:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
def pgm = repsep(expr,";")
def expr :Parser[Any]= ident | expr~"."~num
def num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
But this doesnt work. Either it "matches greedy" and tells me:
[1.2] failure: end of input expected
x.1.2.3;y.4.1.1;z;n.1.10.30
or if I change the |
to a |||
I get a stackoverflow:
Exception in thread "main" java.lang.StackOverflowError
at java.lang.Character.isLetter(Unknown Source)
at java.lang.Character.isLetter(Unknown Source)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32)
...
I kindoff understand why I get the errors; what can I do to parse a syntax like the above? It doesnt seem that esoteric to me
EDIT: Based on the paper referenced in http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html I found out that my program didnt actually use the new packrat parser.
Ie. change Parser[Any]
to PackratParser[Any]
and use lazy val
instead of def
I rewrote the above to this:
import scala.util.parsing.combinator.PackratParsers
import scala.util.parsing.combinator.syntactical._
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
lazy val pgm : PackratParser[Any] = repsep(expr,";")
lazy val expr :PackratParser[Any]= expr~"."~num | ident
lazy val num = numericLit
def parse(input: String) =
phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def main(args: Array[String]) {
val prg = "x.1.2.3 ;" +
"y.4.1.1;" +
"z;" +
"n.1.10.30"
parse(prg);
}
}
回答1:
The problem is (at least partially) that you're not actually using Packrat parsers. See the documentation for Scala's PackratParsers trait, which says
Using PackratParsers is very similar to using Parsers:
- any class/trait that extends Parsers (directly or through a subclass) can mix in PackratParsers. Example: object MyGrammar extends StandardTokenParsers with PackratParsers
- each grammar production previously declared as a def without formal parameters becomes a lazy val, and its type is changed from Parser[Elem] to PackratParser[Elem]. So, for example, def production: Parser[Int] = {...} becomes lazy val production: PackratParser[Int] = {...}
- Important: using PackratParsers is not an all or nothing decision. They can be free mixed with regular Parsers in a single grammar.
I don't know enough about Scala 2.8's parser combinators to fix this entirely, but with the following modifications, I was able to get it to parse as far as the semicolon, which is an improvement over what you've accomplished.
object Dotter extends StandardTokenParsers with PackratParsers {
lexical.delimiters ++= List(".",";")
lazy val pgm:PackratParser[Any] = repsep(expr,";")
lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit)
def parse(input: String) = phrase(expr)(lex(input)) match {
case Success(result, _) => println("Success!"); Some(result)
case n @ _ => println(n);println("bla"); None
}
def lex(input:String) = new PackratReader(new lexical.Scanner(input))
}
回答2:
The production
expr ::= ID | expr . [0-9]+
is left recursive. It expands to
expr ::= ID
expr ::= expr . [0-9]+
where the left recursion occurs on the 2nd line. This is what causes the parser to overflow the stack.
You should rewrite your grammar avoiding left recursive productions.
expr ::= ID {. [0-9]+}
来源:https://stackoverflow.com/questions/3343697/scala-parser-combinators-tricks-for-recursive-bnf