Accessing Scala Parser regular expression match data

前端 未结 4 1318
孤街浪徒
孤街浪徒 2021-02-08 07:55

I wondering if it\'s possible to get the MatchData generated from the matching regular expression in the grammar below.

object DateParser extends JavaTokenParser         


        
相关标签:
4条回答
  • 2021-02-08 08:16

    I ran into a similar issue using scala 2.8.1 and trying to parse input of the form "name:value" using the RegexParsers class:

    package scalucene.query
    
    import scala.util.matching.Regex
    import scala.util.parsing.combinator._
    
    object QueryParser extends RegexParsers {
      override def skipWhitespace = false
    
      private def quoted = regex(new Regex("\"[^\"]+"))
      private def colon = regex(new Regex(":"))
      private def word = regex(new Regex("\\w+"))
      private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
      private def term = (fielded | word | quoted)
    
      def parseItem(str: String) = parse(term, str)
    }
    

    It seems that you can grab the matched groups after parsing like this:

    QueryParser.parseItem("nameExample:valueExample") match {
      case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$$tilde, _) => {
          println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
      }
    }
    
    0 讨论(0)
  • 2021-02-08 08:18

    When a Regex is used in a RegexParsers instance, the implicit def regex(Regex): Parser[String] in RegexParsers is used to appoly that Regex to the input. The Match instance yielded upon successful application of the RE at the current input is used to construct a Success in the regex() method, but only its "end" value is used, so any captured sub-matches are discarded by the time that method returns.

    As it stands (in the 2.7 source I looked at), you're out of luck, I believe.

    0 讨论(0)
  • 2021-02-08 08:19

    No, you can't do this. If you look at the definition of the Parser used when you convert a regex to a Parser, it throws away all context and just returns the full matched string:

    http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55

    You have a couple of other options, though:

    • break up your parser into several smaller parsers (for the tokens you actually want to extract)
    • define a custom parser that extracts the values you want and returns a domain object instead of a string

    The first would look like

    val separator = "-" | "/"
      val year = ("""\d{4}"""r) <~ separator
      val month = ("""\d\d"""r) <~ separator
      val day = """\d\d"""r
    
      val date = ((year?) ~ (month?) ~ day) map {
        case year ~ month ~ day =>
          (year.getOrElse("2009"), month.getOrElse("11"), day)
      }
    

    The <~ means "require these two tokens together, but only give me the result of the first one.

    The ~ means "require these two tokens together and tie them together in a pattern-matchable ~ object.

    The ? means that the parser is optional and will return an Option.

    The .getOrElse bit provides a default value for when the parser didn't define a value.

    0 讨论(0)
  • 2021-02-08 08:24

    Here is the implicit definition that converts your Regex into a Parser:

      /** A parser that matches a regex string */
      implicit def regex(r: Regex): Parser[String] = new Parser[String] {
        def apply(in: Input) = {
          val source = in.source
          val offset = in.offset
          val start = handleWhiteSpace(source, offset)
          (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
            case Some(matched) =>
              Success(source.subSequence(start, start + matched.end).toString, 
                      in.drop(start + matched.end - offset))
            case None =>
              Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
          }
        }
      }
    

    Just adapt it:

    object X extends RegexParsers {
      /** A parser that matches a regex string and returns the Match */
      def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
        def apply(in: Input) = {
          val source = in.source
          val offset = in.offset
          val start = handleWhiteSpace(source, offset)
          (r findPrefixMatchOf (source.subSequence(start, source.length))) match {
            case Some(matched) =>
              Success(matched,
                      in.drop(start + matched.end - offset))
            case None =>
              Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
          }
        }
      }
      val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
    }
    

    Example:

    scala> X.parseAll(X.t, "23/03/1971")
    res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)
    
    0 讨论(0)
提交回复
热议问题