non-greedy matching in Scala RegexParsers

后端 未结 1 340
花落未央
花落未央 2021-01-11 10:02

Suppose I\'m writing a rudimentary SQL parser in Scala. I have the following:

class Arith extends RegexParsers {
    def selectstatement: Parser[Any] = selec         


        
相关标签:
1条回答
  • 2021-01-11 10:48

    Not easily, because a successful match is not retried. Consider, for example:

    object X extends RegexParsers {
      def p = ("a" | "aa" | "aaa" | "aaaa") ~ "ab"
    }
    
    scala> X.parseAll(X.p, "aaaab")
    res1: X.ParseResult[X.~[String,String]] = 
    [1.2] failure: `ab' expected but `a' found
    
    aaaab
     ^
    

    The first match was successful, in parser inside parenthesis, so it proceeded to the next one. That one failed, so p failed. If p was part of alternative matches, the alternative would be tried, so the trick is to produce something that can handle that sort of thing.

    Let's say we have this:

    def nonGreedy[T](rep: => Parser[T], terminal: => Parser[T]) = Parser { in =>
      def recurse(in: Input, elems: List[T]): ParseResult[List[T] ~ T] =
        terminal(in) match {
          case Success(x, rest) => Success(new ~(elems.reverse, x), rest)
          case _ => 
            rep(in) match {
              case Success(x, rest) => recurse(rest, x :: elems)
              case ns: NoSuccess    => ns
            }
        }
    
      recurse(in, Nil)
    }  
    

    You can then use it like this:

    def p = nonGreedy("a", "ab")
    

    By the way,I always found that looking at how other things are defined is helpful in trying to come up with stuff like nonGreedy above. In particular, look at how rep1 is defined, and how it was changed to avoid re-evaluating its repetition parameter -- the same thing would probably be useful on nonGreedy.

    Here's a full solution, with a little change to avoid consuming the "terminal".

    trait NonGreedy extends Parsers {
        def nonGreedy[T, U](rep: => Parser[T], terminal: => Parser[U]) = Parser { in =>
          def recurse(in: Input, elems: List[T]): ParseResult[List[T]] =
            terminal(in) match {
              case _: Success[_] => Success(elems.reverse, in)
              case _ => 
                rep(in) match {
                  case Success(x, rest) => recurse(rest, x :: elems)
                  case ns: NoSuccess    => ns
                }
            }
    
          recurse(in, Nil)
        }  
    }
    
    class Arith extends RegexParsers with NonGreedy {
        // Just to avoid recompiling the pattern each time
        val select: Parser[String] = "(?i)SELECT".r
        val from: Parser[String] = "(?i)FROM".r
        val token: Parser[String] = "(\\s*)\\w+(\\s*)".r
        val eof: Parser[String] = """\z""".r
    
        def selectstatement: Parser[Any] = selectclause(from) ~ fromclause(eof)
        def selectclause(terminal: Parser[Any]): Parser[Any] = 
          select ~ tokens(terminal)
        def fromclause(terminal: Parser[Any]): Parser[Any] = 
          from ~ tokens(terminal)
        def tokens(terminal: Parser[Any]): Parser[Any] = 
          nonGreedy(token, terminal)
    }
    
    0 讨论(0)
提交回复
热议问题