How do you use parsec in a greedy fashion?

后端 未结 1 898
长情又很酷
长情又很酷 2021-02-14 18:24

In my work I come across a lot of gnarly sql, and I had the bright idea of writing a program to parse the sql and print it out neatly. I made most of it pretty quickly, but I r

相关标签:
1条回答
  • 2021-02-14 18:56

    Yeah, between might not work for what you're looking for. Of course, for your use case, I'd follow hammar's suggestion and grab an off-the-shelf SQL parser. (personal opinion: or, try not to use SQL unless you really have to; the idea to use strings for database queries was imho a historical mistake).

    Note: I add an operator called <++> which will concatenate the results of two parsers, whether they are strings or characters. (code at bottom.)

    First, for the task of parsing parenthesis: the top level will parse some stuff between the relevant characters, which is exactly what the code says,

    parseParen = char '(' <++> inner <++> char ')'
    

    Then, the inner function should parse anything else: non-parens, possibly including another set of parenthesis, and non-paren junk that follows.

    parseParen = char '(' <++> inner <++> char ')' where
        inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
    

    I'll make the assumption that for the rest of the solution, what you want to do is analgous to splitting things up by top-level SQL keywords. (i.e. ignoring those in parenthesis). Namely, we'll have a parser that will behave like so,

    Main> parseTest parseSqlToplevel "select asdf(select m( 2) fr(o)m w where n) from b where delete 4"
    [(Select," asdf(select m( 2) fr(o)m w where n) "),(From," b "),(Where," "),(Delete," 4")]
    

    Suppose we have a parseKw parser that will get the likes of select, etc. After we consume a keyword, we need to read until the next [top-level] keyword. The last trick to my solution is using the lookAhead combinator to determine whether the next word is a keyword, and put it back if so. If it's not, then we consume a parenthesis or other character, and then recurse on the rest.

    -- consume spaces, then eat a word or parenthesis
    parseOther = many space <++>
        (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
         option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
    

    My entire solution is as follows

    -- overloaded operator to concatenate string results from parsers
    class CharOrStr a where toStr :: a -> String
    instance CharOrStr Char where toStr x = [x]
    instance CharOrStr String where toStr = id
    infixl 4 <++>
    f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
    
    data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
    
    parseKw =
        (Select <$ string "select") <|>
        (Update <$ string "update") <|>
        (Delete <$ string "delete") <|>
        (From <$ string "from") <|>
        (Where <$ string "where") <?>
        "keyword (select, update, delete, from, where)"
    
    -- consume spaces, then eat a word or parenthesis
    parseOther = many space <++>
        (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
         option "" ((parseParen <|> many1 (noneOf "() \t")) <++> parseOther))
    
    parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
    
    parseParen = char '(' <++> inner <++> char ')' where
        inner = many (noneOf "()") <++> option "" (parseParen <++> inner)
    

    edit - version with quote support

    you can do the same thing as with the parens to support quotes,

    import Control.Applicative hiding (many, (<|>))
    import Text.Parsec
    import Text.Parsec.Combinator
    
    -- overloaded operator to concatenate string results from parsers
    class CharOrStr a where toStr :: a -> String
    instance CharOrStr Char where toStr x = [x]
    instance CharOrStr String where toStr = id
    infixl 4 <++>
    f <++> g = (\x y -> toStr x ++ toStr y) <$> f <*> g
    
    data Keyword = Select | Update | Delete | From | Where deriving (Eq, Show)
    
    parseKw =
        (Select <$ string "select") <|>
        (Update <$ string "update") <|>
        (Delete <$ string "delete") <|>
        (From <$ string "from") <|>
        (Where <$ string "where") <?>
        "keyword (select, update, delete, from, where)"
    
    -- consume spaces, then eat a word or parenthesis
    parseOther = many space <++>
        (("" <$ lookAhead (try parseKw)) <|> -- if there's a keyword, put it back!
         option "" ((parseParen <|> parseQuote <|> many1 (noneOf "'() \t")) <++> parseOther))
    
    parseSqlToplevel = many ((,) <$> parseKw <*> (space <++> parseOther)) <* eof
    
    parseQuote = char '\'' <++> inner <++> char '\'' where
        inner = many (noneOf "'\\") <++>
            option "" (char '\\' <++> anyChar <++> inner)
    
    parseParen = char '(' <++> inner <++> char ')' where
        inner = many (noneOf "'()") <++>
            (parseQuote <++> inner <|> option "" (parseParen <++> inner))
    

    I tried it with parseTest parseSqlToplevel "select ('a(sdf'())b". cheers

    0 讨论(0)
提交回复
热议问题