问题
I have the following code for simple parser of logical expressions:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.combinator.PackratParsers
object Parsers extends RegexParsers with PackratParsers
// Entities definition
sealed trait LogicalUnit
case class Variable(name: String) extends LogicalUnit
case class Not(arg: LogicalUnit) extends LogicalUnit
case class And(arg1: LogicalUnit, arg2: LogicalUnit) extends LogicalUnit
import Parsers._
// In order of descending priority
lazy val pattern: PackratParser[LogicalUnit] =
((variable) | (not) | (and))
lazy val variable: PackratParser[Variable] =
"[a-zA-Z]".r ^^ { n => Variable(n) }
lazy val not: PackratParser[Not] =
("!" ~> pattern) ^^ { x => Not(x) }
lazy val and: PackratParser[And] =
((pattern <~ "&") ~ pattern) ^^ { case a ~ b => And(a, b) }
// Execution
println(Parsers.parseAll(pattern, "!a & !b"))
So, trying to parse a string !a & !b
and it fails with
[1.4] failure: string matching regex `\z' expected but `&' found
!a & !b
^
It seems that root parser tries to parse a whole string as pattern -> not -> variable
and doesn't backtrack when it discovers that !a
is not the end yet, so pattern -> and
isn't even tried. I thought that using PackratParsers
should solve that, but it didn't
What am I doing wrong?
回答1:
I don't think there is any way to make one of these parsers backtrack once it has successfully accepted something. If an alternative succeeds, no other alternative are tried. This behaviour is intrinsic to the packrat parsing method for Parsing Expression Grammars that these combinators implement (as opposed to Context-Free Grammars where the order of alternatives is not relevant and backtracking behaviour depends on the parsing method). That is why the alternatives that may match longer input should be given first.
Regarding the precedence of not versus and, the standard approach is to encode the precedence and associativity of operators in the grammar rules as you would for Context-Free Grammars. Most books on parsing will describe how to do this. You can see one version in the following notes starting at slide 24: http://www.sci.usq.edu.au/courses/CSC3403/lect/syntax-1up.pdf.
回答2:
I don't know the specific reason, but whenever I encountered such a problem with Parsers, I put the order of the parse possibilities from the most complicated to the simplest.
In your case it would be
lazy val pattern: PackratParser[LogicalUnit] = ((and) | (not) | (variable))
, which makes your example parse.
The result is however Not(And(Variable(a),Not(Variable(b))))
, which might be not what you want.
The reason is that a & !b
is a valid pattern, so !a & !b
can be parsed starting from not
.
To change that, you can introduce parenthesis. This is one simple possibility:
lazy val not: PackratParser[Not] =
("!" ~> term) ^^ { x => Not(x) }
lazy val term: PackratParser[LogicalUnit] =
variable | "(" ~> and <~ ")"
Now the result is And(Not(Variable(a)),Not(Variable(b)))
.
来源:https://stackoverflow.com/questions/26453870/scala-packratparsers-does-not-backtrack-as-it-should