Why Parsec's sepBy stops and does not parse all elements?

问题

I am trying to parse some comma separated string which may or may not contain a string with image dimensions. For example "hello world, 300x300, good bye world".

I've written the following little program:

import Text.Parsec
import qualified Text.Parsec.Text as PS

parseTestString :: Text -> [Maybe (Int, Int)]
parseTestString s = case parse dimensStringParser "" s of
                      Left _ -> [Nothing]
                      Right dimens -> dimens

dimensStringParser :: PS.Parser [Maybe (Int, Int)]
dimensStringParser = (optionMaybe dimensParser) `sepBy` (char ',')

dimensParser :: PS.Parser (Int, Int)
dimensParser = do
  w <- many1 digit
  char 'x'
  h <- many1 digit
  return (read w, read h)

main :: IO ()
main = do
  print $ parseTestString "300x300,40x40,5x5"
  print $ parseTestString "300x300,hello,5x5,6x6"

According to optionMaybe documentation, it returns Nothing if it can't parse, so I would expect to get this output:

[Just (300,300),Just (40,40),Just (5,5)]
[Just (300,300),Nothing, Just (5,5), Just (6,6)]

but instead I get:

[Just (300,300),Just (40,40),Just (5,5)]
[Just (300,300),Nothing]

I.e. parsing stops after first failure. So I have two questions:

Why does it behave this way?
How do I write a correct parser for this case?

回答1:

I'd guess that optionMaybe dimensParser, when fed with input "hello,...", tries dimensParser. That fails, so optionMaybe returns success with Nothing, and consumes no portion of the input.

The last part is the crucial one: after Nothing is returned, the input string to be parsed is still "hello,...".

At that point sepBy tries to parse char ',', which fails. So, it deduces that the list is over, and terminates the output list, without consuming any more input.

If you want to skip other entities, you need a "consuming" parser that returns Nothing instead of optionMaybe. That parser, however, need to know how much to consume: in your case, until the comma.

Perhaps you need some like (untested)

(   try (Just <$> dimensParser) 
<|> (noneOf "," >> return Nothing))
    `sepBy` char ','

回答2:

In order to answer this question, it's handy to take a piece of paper, write down the input, and act as a dumb parser.

We start with "300x300,hello,5x5,6x6", our current parser is optionMaybe .... Does our dimensParser correctly parse the dimension? Let's check:

  w <- many1 digit   -- yes, "300"
  char 'x'           -- yes, "x"
  h <- many1 digit   -- yes, "300"
  return (read w, read h) -- never fails

We've successfully parsed the first dimension. The next token is ,, so sepBy successfully parses that as well. Next, we try to parse "hello" and fail:

 w <- many1 digit -- no. 'h' is not a digit. Stop

Next, sepBy tries to parse ,, but that's not possible, since the next token is a 'h', not a ,. Therefore, sepBy stops.

We haven't parsed all the input, but that's not actually necessary. You would get a proper error message if you've used

parse (dimensStringParser <* eof)

Either way, if you want to discard anything in the list that's not a dimension, you can use

dimensStringParser1 :: Parser (Maybe (Int, Int))
dimensStringParser1 = (Just <$> dimensParser) <|> (skipMany (noneOf ",") >> Nothing)

dimensStringParser = dimensStringParser1  `sepBy` char ','

来源：https://stackoverflow.com/questions/48048903/why-parsecs-sepby-stops-and-does-not-parse-all-elements

标签

parsing

haskell

parsec