Using `err` in a Child Parser

问题

In the following Parser:

object Foo extends JavaTokenParsers { 

  def word(x: String) = s"\\b$x\\b".r

  lazy val expr  = aSentence | something

  lazy val aSentence = noun ~ verb ~ obj

  lazy val noun   = word("noun")
  lazy val verb   = word("verb") | err("not a verb!")
  lazy val obj    = word("object")

  lazy val something = word("FOO")
}

It will parse noun verb object.

scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)

But, when entering a valid noun, but an invalid verb, why won't the err("not a verb!") return an Error with that particular error message?

scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found

noun vedsfasdf
     ^

credit: Thanks to Travis Brown for explaining the need for the word function here.

This question seems similar, but I'm not sure how to handle err with the ~ function.

回答1:

Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence, it's then going to try something.

The culprit should be obvious when you think about it: what in that source code is taking two Failure results and choosing one? | (aka append).

This method on Parser will feed the input to both parsers, and then call append on ParseResult. That method is abstract at that level, and defined on Success, Failure and Error in different ways.

On both Success and Error, it always take this (that is, the parser on the left). On Failure, though, it does something else:

case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
  /** The toString method of a Failure yields an error message. */
  override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString

  def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
    case Success(_, _) => alt
    case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
  }}
}

Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO), but if both have read the same amount, it will give precedence to the second failure.

I do wonder if it shouldn't check whether the right side is an Error, and, if so, return that. After all, if the left side is an Error, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.

Back to the problem, it would seem that it should have gone with err, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err.

That means that err's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:

lazy val verb   = word("verb") | " *".r ~ err("not a verb!")

Arguably, err ought to be overridden by RegexParsers to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).

来源：https://stackoverflow.com/questions/25147833/using-err-in-a-child-parser

标签

scala

parser-combinators