Using Parsec to parse regular expressions

后端 未结 2 924
情深已故
情深已故 2021-02-08 23:09

I\'m trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:

EXP  : EXP *
     | LIT EXP
     | LIT


        
相关标签:
2条回答
  • 2021-02-08 23:32

    Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:

    lit :: Parser (Char, Maybe Char)
    lit = do
      c <- noneOf "*"
      s <- optionMaybe $ char '*'
      return (c, s)
    

    Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:

    import Control.Applicative ((<$>))
    
    data Term = Literal Char
              | Sequence [Term]
              | Star Term
    
    expr :: Parser Term
    expr = Sequence <$> many term
    
    term :: Parser Term
    term = do
      c <- lit
      s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
      return $ if isNothing s
        then Literal c
        else Star $ Literal c
    

    Maybe a more experienced Haskeller will come along with a better solution.

    0 讨论(0)
  • 2021-02-08 23:42

    You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!

    You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.

    Here's my attempt (I threw in |, +, and ? for good measure):

    import Control.Applicative
    import Control.Monad
    import Text.ParserCombinators.Parsec
    import Text.ParserCombinators.Parsec.Expr
    
    data Term = Literal Char
              | Sequence [Term]
              | Repeat (Int, Maybe Int) Term
              | Choice [Term]
      deriving ( Show )
    
    term :: Parser Term
    term = buildExpressionParser ops atom where
    
      ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
              , Postfix (Repeat (1, Nothing) <$ char '+')
              , Postfix (Repeat (0, Just 1)  <$ char '?')
              ]
            , [ Infix (return sequence) AssocRight
              ]
            , [ Infix (choice <$ char '|') AssocRight
              ]
            ]
    
      atom = msum [ Literal <$> lit
                  , parens term
                  ]
    
      lit = noneOf "*+?|()"
      sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
      choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
      parens = between (char '(') (char ')')
    
      seqTerms (Sequence ts) = ts
      seqTerms t = [t]
    
      choiceTerms (Choice ts) = ts
      choiceTerms t = [t]
    
    main = parseTest term "he(llo)*|wor+ld?"
    
    0 讨论(0)
提交回复
热议问题