parsec: string choice parser with useful error messages

后端未结

关注

 3  1327

Let\'s have following parser:

parser :: GenParser Char st String
parser = choice (fmap (try . string) [\"head\", \"tail\", \"tales\"]
                    <


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2021-01-14 10:06
              
            
            
                                                                       
Here's what I've got with Parsec:

λ> let parser = choice $ fmap (try . string) ["head", "tail", "tales"]
λ> parseTest parser "ta"
parse error at (line 1, column 1):
unexpected "t"
expecting "head", "tail" or "tales"


If you care to try modern version of Parsec —
Megaparsec, you will end
up with:

λ> let parser = choice $ fmap (try . string) ["head", "tail", "tales"]
λ> parseTest parser "ta"
1:1:
unexpected "ta" or 't'
expecting "head", "tail", or "tales"


What's going on here? First, when we parse ordered collection of characters,
like with string, we display incorrect input completely. This is much
better in our opinion because:

λ> parseTest (string "when" <* eof) "well"
1:1:
unexpected "we"
expecting "when"


We're pointing to beginning of the word and we show the whole thing that is
not correct (up to first mismatching character) and the whole thing we
expect. This is more readable in my opinion. Only parsers built on tokens
work this way (that is, when we're trying to match fixed string,
case-insensitive variant is available).

Then, what about unexpected "ta" or 't', why do we get 't' part? This is
also absolutely correct, because with your collection of alternatives, the
first letter 't' can be also unexpected by itself because you have
an alternative that doesn't start with 't'. Let's see another example:

λ> let parser = choice $ fmap (try . string) ["tall", "tail", "tales"]
λ> parseTest parser "ta"
1:1:
unexpected "ta"
expecting "tail", "tales", or "tall"


Or how about:

λ> parseTest (try (string "lexer") <|> string "lexical") "lex"
1:1:
unexpected "lex"
expecting "lexer" or "lexical"


Parsec:

λ> parseTest (try (string "lexer") <|> string "lexical") "lex"
parse error at (line 1, column 1):
unexpected end of input
expecting "lexical"


Why take pains to make it work when it can “just work”?

There are many other great things about Megaparsec, if you're interested,
you can learn more about it
here. It's hard to compete with
Parsec, but we have written our own tutorials and our docs are very good.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2021-01-14 10:13
              
            
            
                                                                       
It's not hard to cook up a function which does this correctly. We'll just rip one character off at a time, using Data.Map to find the shared suffixes:

{-# LANGUAGE FlexibleContexts #-}
import Control.Applicative
import Data.Map hiding (empty)
import Text.Parsec hiding ((<|>))
import Text.Parsec.Char

-- accept the empty string if that's a choice
possiblyEmpty :: Stream s m Char => [String] -> ParsecT s u m String
possiblyEmpty ss | "" `elem` ss = pure ""
                 | otherwise    = empty

chooseFrom :: Stream s m Char => [String] -> ParsecT s u m String
chooseFrom ss
     =  foldWithKey (\h ts parser -> liftA2 (:) (char h) (chooseFrom ts) <|> parser)
                    empty
                    (fromListWith (++) [(h, [t]) | h:t <- ss])
    <|> possiblyEmpty ss


We can verify in ghci that it succesfully matches "tail" and "tales", and that it asks for i or l after a failed parse starting with ta:

*Main> parse (chooseFrom ["head", "tail", "tales"]) "" "tail"
Right "tail"
*Main> parse (chooseFrom ["head", "tail", "tales"]) "" "tales"
Right "tales"
*Main> parse (chooseFrom ["head", "tail", "tales"]) "" "tafoo"
Left (line 1, column 3):
unexpected "f"
expecting "i" or "l"

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2021-01-14 10:16
              
            
            
                                                                       
Old answer for old nonworking example

Which version of parsec do you have installed? 3.1.9 does this for me:

Prelude> :m + Text.Parsec Text.Parsec.String
Prelude Text.Parsec Text.Parsec.String> :set prompt Main>
Main> let parser = choice (map (try . string) ["foo", "fob", "bar"]) :: GenParser Char st String
Main> runParser parser () "Hey" "fo "
Left "Hey" (line 1, column 1):
unexpected " "
expecting "foo", "fob" or "bar"
Main> runParser parser () "Hey" "fo"
Left "Hey" (line 1, column 1):
unexpected end of input
expecting "foo", "fob" or "bar"


The added <?> error_message doesn't change anything except that it changes that last line to expecting expected one of ['foo', 'fob', 'bar'].

How to extract more errors out of Parsec

So this is one of those cases where you shouldn't trust the error message to be exhaustive about the information that is available in the system. Let me give a funky Show instance for Text.Parsec.Error:Message (which is basically what it would be if it were deriving (Show)) so that you can see what's coming out of Parsec:

Main> :m + Text.Parsec.Error
Main> instance Show Message where show m = (["SysUnExpect", "UnExpect", "Expect", "Message"] !! fromEnum m) ++ ' ' : show (messageString m)
Main> case runParser parser () "" "ta" of Left pe -> errorMessages pe
[SysUnExpect "\"t\"",SysUnExpect "",SysUnExpect "",Expect "\"head\"",Expect "\"tail\"",Expect "\"tales\""]


You can see that secretly choice is dumping all of its information into a bunch of parallel messages, and storing "unexpected end-of-file" as SysUnExpect "". The show instance for ParseError apparently grabs the first SysUnExpect but all of the Expect messages and dumps them for you to see.

The exact function which does this at present is Text.Parsec.Error:showErrorMessages. The error messages are expected to be in order and are broken into 4 chunks based on the constructor; the SysUnExpect chunk is sent through a special display function which hides the text completely if there are bona-fide UnExpect elements or else shows only the first SysUnExpect message:

  showSysUnExpect | not (null unExpect) ||
                    null sysUnExpect = ""
                  | null firstMsg    = msgUnExpected ++ " " ++ msgEndOfInput
                  | otherwise        = msgUnExpected ++ " " ++ firstMsg


It may be worth rewriting this or sending a bug upstream, as this is kinda weird behavior, and the data structures don't quite suit them. First, your problem in a nutshell is: it seems like each Message should have a SourcePos, not each ParseError. 

So, there is an earlier step, mergeErrors, which prefers ParseErrors with later SourcePos-es. This doesn't fire because messages don't have a SourcePos, which means that all of the errors from choice start at the beginning of the string rather than at the maximal point matched. You can see this for example in how this doesn't get stuck on parsing "tai":

let parser = try (string "head") <|> choice (map (try . (string "ta" >>) . string) ["il", "les"]) :: GenParser Char st Strinh


Second, apart from that, probably we should bind together messages that go together (so the default message is unexpected 't', expected "heads" | unexpected end-of-file, expected 'tails' | unexpected end-of-file, expected 'tales' unless you override it with <?>). Third, probably the ParseError constructor should be exported; fourth, the enumerated type in Message is really ugly and might be better put into ParseError {systemUnexpected :: [Message], userUnexpected :: [Message], expected :: [Message], other :: [Message]} or something, even in its present incarnation. (For example, the current Show for ParseError will break subtly if the messages aren't in a certain order.)

In the meantime I would recommend writing your own show variant for ParseError.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复