All of the parsers in Text.Parsec.Token
politely use lexeme
to eat whitespace after a token. Unfortunately for me, whitespace includes new lines, whic
No, it is not. Here is the relevant code.
From Text.Parsec.Token:
lexeme p
= do{ x <- p; whiteSpace; return x }
--whiteSpace
whiteSpace
| noLine && noMulti = skipMany (simpleSpace > "")
| noLine = skipMany (simpleSpace <|> multiLineComment > "")
| noMulti = skipMany (simpleSpace <|> oneLineComment > "")
| otherwise = skipMany (simpleSpace <|> oneLineComment <|> multiLineComment > "")
where
noLine = null (commentLine languageDef)
noMulti = null (commentStart languageDef)
One will notice in the where clause of whitespace
that the only only options looked at deal with comments. The lexeme
function uses whitespace
and it is used liberally in the rest of parsec.token
.
The ultimate solution for me was to use a proper lexical analyser (alex). Parsec does a very good job as a parsing library and it is a credit to the design that it can be mangled into doing lexical analysis, but for all but small and simple projects it will quickly become unwieldy. I now use alex to create a linear set of tokens and then Parsec turns them into an AST.