All of the parsers in Text.Parsec.Token
politely use lexeme
to eat whitespace after a token. Unfortunately for me, whitespace includes new lines, whic
Well, not all parsers in Text.Parsec.Token
use lexeme
, although all of
them should. Worst of all it's not documented which of them consume white
space and which of them do not. Some of the parsers in Text.Parsec.Token
do consume white space after lexeme, some of them don't. Some of them
consume leading whitespace as well. You should read existing issues on
GitHub issue tracker if you want to control the situation fully.
In particular:
decimal
, hexadecimal
, and octal
parsers do not consume trailing
white space, see
the source,
and this issue;
integer
consumes leading whitespace as well, see
this issue;
rest of them probably consume trailing whitespace and thus newlines, this is however difficult to tell for sure because Parsec's code is particularly hairy (IMHO) and the project has no test suite (except for 3 tests which checks that already fixed bugs do not show up again, however it's not enough to prevent regressions and every change in source may break your code in next release of Parsec.)
There are various propositions how to make it configurable (what should be considered white space), none of them is merged or commented on for some reason.
But the real problem is rather in design of Text.Parsec.Token
, which locks
user into solutions built by makeTokenParser
. This design is particularly
non-flexible. There are many cases when only one solution is to copy the
entire module and edit it as needed.
But if you want modern and consistent Parsec there is an option to switch to Megaparsec where this (and many others) problem is non-existent.
Disclosure: I'm one of the authors of Megaparsec.