In Parsec, is there a way to prevent lexeme from consuming newlines?

前端 未结 4 1987
生来不讨喜
生来不讨喜 2021-02-13 12:36

All of the parsers in Text.Parsec.Token politely use lexeme to eat whitespace after a token. Unfortunately for me, whitespace includes new lines, whic

4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-13 12:57

    Well, not all parsers in Text.Parsec.Token use lexeme, although all of them should. Worst of all it's not documented which of them consume white space and which of them do not. Some of the parsers in Text.Parsec.Token do consume white space after lexeme, some of them don't. Some of them consume leading whitespace as well. You should read existing issues on GitHub issue tracker if you want to control the situation fully.

    In particular:

    • decimal, hexadecimal, and octal parsers do not consume trailing white space, see the source, and this issue;

    • integer consumes leading whitespace as well, see this issue;

    • rest of them probably consume trailing whitespace and thus newlines, this is however difficult to tell for sure because Parsec's code is particularly hairy (IMHO) and the project has no test suite (except for 3 tests which checks that already fixed bugs do not show up again, however it's not enough to prevent regressions and every change in source may break your code in next release of Parsec.)

    There are various propositions how to make it configurable (what should be considered white space), none of them is merged or commented on for some reason.

    But the real problem is rather in design of Text.Parsec.Token, which locks user into solutions built by makeTokenParser. This design is particularly non-flexible. There are many cases when only one solution is to copy the entire module and edit it as needed.

    But if you want modern and consistent Parsec there is an option to switch to Megaparsec where this (and many others) problem is non-existent.


    Disclosure: I'm one of the authors of Megaparsec.

提交回复
热议问题