问题
I have been trying to build a SQL parser with the scala-parser-combinator library, which I've simplified greatly into the code below.
class Expression
case class FalseExpr() extends Expression
case class TrueExpr() extends Expression
case class AndExpression(expr1: Expression, expr2: Expression) extends Expression
object SimpleSqlParser {
def parse(sql: String): Try[Expression] = new SimpleSqlParser().parse(sql)
}
class SimpleSqlParser extends RegexParsers {
def parse(sql: String): Try[_ <: Expression] = parseAll(expression, sql) match {
case Success(matched,_) => scala.util.Success(matched)
case Failure(msg,remaining) => scala.util.Failure(new Exception("Parser failed: "+msg + "remaining: "+ remaining.source.toString.drop(remaining.offset)))
case Error(msg,_) => scala.util.Failure(new Exception(msg))
}
private def expression: Parser[_ <: Expression] =
andExpr | falseExpr | trueExpr
private def falseExpr: Parser[FalseExpr] =
"false" ^^ (_ => FalseExpr())
private def trueExpr: Parser[TrueExpr] = "true" ^^ (_ => TrueExpr())
private def andExpr: Parser[Expression] =
expression ~ "and" ~ expression ^^ { case e1 ~ and ~ e2 => AndExpression(e1,e2)}
}
Without the 'and' parsing, it works fine. But I want to be able to parse things like 'true AND (false OR true)', for example. When I add the 'and' part to the definition of an expression, I get a StackOverflowError, the stack is alternating between the definitions of 'and' and 'expression'.
I understand why this is happening - the definition of expression begins with and, and vice-versa. But this seems like the most natural way to model this problem. In reality an expression could also be LIKE, EQUALS etc. Is there another way to model this kind of thing in general in order to get around the problem of recursive definitions.
回答1:
scala.util.parsing.combinator.RegexParsers
cannot handle left-recursive grammars. Your grammar can be summarized by the following production rules:
expression -> andExpr | falseExpr | trueExpr
...
andExpr -> expression "and" expression
expression
is indirectly left-recursive via andExpr
.
To avoid the infinite recursion, you need to reformulate the grammar so that it is not left-recursive anymore. One frequently-used way is to use repetition combinators, such as chainl1:
private def expression: Parser[_ <: Expression] =
chainl1(falseExpr | trueExpr, "and" ^^^ { AndExpression(_, _) })
Live on Scastie
The new expression
matches one or more falseExpr
/trueExpr
, separated by "and"
, and combines the matched elements with AndExpression
in a left-associative way. Conceptually, it corresponds to the following production rule:
expression -> (falseExpr | trueExpr) ("and" (falseExpr | trueExpr))*
If your grammar contains many tangled left-recursive production rules, you might want to consider other parser combinator libraries, such as GLL combinators, that directly support left recursion.
来源:https://stackoverflow.com/questions/48207601/recursive-definitions-with-scala-parser-combinators