问题
In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object
.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun
, but an invalid verb
, why won't the err("not a verb!")
return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word
function here.
This question seems similar, but I'm not sure how to handle err
with the ~
function.
回答1:
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence
, it's then going to try something
.
The culprit should be obvious when you think about it: what in that source code is taking two Failure
results and choosing one? |
(aka append
).
This method on Parser
will feed the input to both parsers, and then call append
on ParseResult
. That method is abstract at that level, and defined on Success
, Failure
and Error
in different ways.
On both Success
and Error
, it always take this
(that is, the parser on the left). On Failure
, though, it does something else:
case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
/** The toString method of a Failure yields an error message. */
override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString
def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
case Success(_, _) => alt
case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
}}
}
Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO
), but if both have read the same amount, it will give precedence to the second failure.
I do wonder if it shouldn't check whether the right side is an Error
, and, if so, return that. After all, if the left side is an Error
, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.
Back to the problem, it would seem that it should have gone with err
, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err
.
That means that err
's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:
lazy val verb = word("verb") | " *".r ~ err("not a verb!")
Arguably, err
ought to be overridden by RegexParsers
to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).
来源:https://stackoverflow.com/questions/25147833/using-err-in-a-child-parser