I\'m working on a small parser using Megaparsec and trying to parse arithmetic.
-- Arithmetic expressions
data Aexp = N Num
| V Var
| M
Parsec tries the left alternative of <|>
before it tries the right one. If the left alternative succeeds then it won't bother with the right one. So in this instance, when fed the input 5*5
, Parsec's process looks like this:
V <$> strParser
. strParser
begins with tok "\""
, but the input string doesn't begin with '"'
so strParser
fails.N <$> numParser
. numParser
successfully parses the number 5, so Parsec just returns N 5
.So we can attempt to patch this parser up by moving the Mult
option up to the top, wrapped in a try
so that it can backtrack and try numParser
or strParser
if the input turns out not to be a multiplication.
arithParser :: Parser Aexp
arithParser = try (Mult <$> arithParser <* tok "*" <*> arithParser)
<|> N <$> numParser
<|> V <$> strParser
This parser has another, more subtle problem. Let's walk through the steps, as above.
try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins with arithParser
, so recursively call arithParser
.try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins with arithParser
, so recursively call arithParser
.try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins with arithParser
, so recursively call arithParser
.It's an infinite loop. Parsec can't handle left-recursive grammars. You have to design your parser so that it consumes at least one token before a recursive call. One common way of doing this is to "flatten out" your grammar:
expr, term :: Parser AExp
expr = do
n <- term
rest <- optional $ tok "*" *> expr
return $ maybe n (Mult n) rest
term = N <$> numParser
<|> V <$> strParser
<|> parenthesised expr
parenthesised = between (char '(') (char ')')
Here I've split up the parser into one which parses an arbitrary expr
- a term
optionally followed by a multiplication symbol and a multiplicand expr
- and one which parses single term
s - numbers, strings, and parenthesised expressions. The recursive calls to expr
are OK now - the one inside expr
happens only after you've parsed a term
(which always consumes input) and the one inside term
happens only after you've parsed an opening parenthesis.
Note that expr
has a list-like structure: it parses a single thing possibly followed by many things. In general you should think of parsers consuming a linear input stream of input tokens, so it's not surprising that list-shaped parsers tend to be more effective than tree-shaped ones.
The Text.Megaparsec.Expr module contains functions which package up this pattern and parse expressions with arbitrary precedence and fixity rules.
expr = makeExprParser term [[InfixR $ tok "*" $> Mult]]