Scala Regular Expressions (string delimited by double quotes)

一曲冷凌霜 提交于 2020-01-02 03:16:31

问题


I am new to scala. I am trying to match a string delimited by double quotes, and I am a bit puzzled by the following behavior:

If I do the following:

val stringRegex = """"([^"]*)"(.*$)"""
val regex = stringRegex.r
val tidyTokens = Array[String]("1", "\"test\"", "'c'", "-23.3")
tidyTokens.foreach {
    token => if (token.matches (stringRegex)) println (token + " matches!")
}

I get

"test" matches!

otherwise, if I do the following:

tidyTokens.foreach {
    token => token match {
        case regex(token) => println (token + " matches!")
        case _ => println ("No match for token " + token)
    }
}

I get

No match for token 1
No match for token "test"
No match for token 'c'
No match for token -23.3

Why doesn't "test" match in the second case?


回答1:


Take your regular expression:

 "([^"]*)"(.*$)

When compiled with .r, this string yields a regex object - which, if it matches it's input string, must yield 2 captured strings - one for the ([^"]*) and the other for the (.*$). Your code

  case regex(token) => ...

Ought to reflect this, so maybe you want

  case regex(token, otherStuff) => ...

Or just

  case regex(token, _) => ...

Why? Because the case regex(matchedCaputures...) syntax works because regex is an object with an unapplySeq method. case regex(token) => ... translates (roughly) to:

 case List(token) => ...

Where List(token) is what regex.unapplySeq( inputString ) returns:

 regex.unapplySeq("\"test\"") // Returns Some(List("test", ""))

Your regex does match the string "test" but in the case statement the regex extractor's unapplySeq method returns a list of 2 strings because that is what the regex says it captures. That's unfortunate, but the compiler can't help you here because regular expressions are compiled from strings at runtime.

One alternative would be to use a non-capturing group:

 val stringRegex = """"([^"]*)"(?:.*$)"""
 //                             ^^

Then your code would work, because regex will now be an extractor object whose unapplySeq method returns only a single captured group:

 tidyTokens foreach { 
    case regex(token) => println (token + " matches!")
    case t => println ("No match for token " + t)
 }

Have a look at the tutorial on Extractor Objects, for a better understanding on how apply / unapply / unapplySeq works.



来源:https://stackoverflow.com/questions/15119238/scala-regular-expressions-string-delimited-by-double-quotes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!