In Parsec, is there a way to prevent lexeme from consuming newlines?

前端 未结 4 1973
生来不讨喜
生来不讨喜 2021-02-13 12:36

All of the parsers in Text.Parsec.Token politely use lexeme to eat whitespace after a token. Unfortunately for me, whitespace includes new lines, whic

4条回答
  •  再見小時候
    2021-02-13 12:44

    No, it is not. Here is the relevant code.

    From Text.Parsec.Token:

    lexeme p
        = do{ x <- p; whiteSpace; return x  }
    
    
    --whiteSpace
    whiteSpace
        | noLine && noMulti  = skipMany (simpleSpace  "")
        | noLine             = skipMany (simpleSpace <|> multiLineComment  "")
        | noMulti            = skipMany (simpleSpace <|> oneLineComment  "")
        | otherwise          = skipMany (simpleSpace <|> oneLineComment <|> multiLineComment  "")
        where
          noLine  = null (commentLine languageDef)
          noMulti = null (commentStart languageDef)
    

    One will notice in the where clause of whitespace that the only only options looked at deal with comments. The lexeme function uses whitespace and it is used liberally in the rest of parsec.token.


    Update Sept. 28, 2015

    The ultimate solution for me was to use a proper lexical analyser (alex). Parsec does a very good job as a parsing library and it is a credit to the design that it can be mangled into doing lexical analysis, but for all but small and simple projects it will quickly become unwieldy. I now use alex to create a linear set of tokens and then Parsec turns them into an AST.

提交回复
热议问题