Equation (expression) parser with precedence?

前端 未结 23 1526
遇见更好的自我
遇见更好的自我 2020-11-22 11:44

I\'ve developed an equation parser using a simple stack algorithm that will handle binary (+, -, |, &, *, /, etc) operators, unary (!) operators, and parenthesis.

<
23条回答
  •  花落未央
    2020-11-22 12:24

    The hard way

    You want a recursive descent parser.

    To get precedence you need to think recursively, for example, using your sample string,

    1+11*5
    

    to do this manually, you would have to read the 1, then see the plus and start a whole new recursive parse "session" starting with 11... and make sure to parse the 11 * 5 into its own factor, yielding a parse tree with 1 + (11 * 5).

    This all feels so painful even to attempt to explain, especially with the added powerlessness of C. See, after parsing the 11, if the * was actually a + instead, you would have to abandon the attempt at making a term and instead parse the 11 itself as a factor. My head is already exploding. It's possible with the recursive decent strategy, but there is a better way...

    The easy (right) way

    If you use a GPL tool like Bison, you probably don't need to worry about licensing issues since the C code generated by bison is not covered by the GPL (IANAL but I'm pretty sure GPL tools don't force the GPL on generated code/binaries; for example Apple compiles code like say, Aperture with GCC and they sell it without having to GPL said code).

    Download Bison (or something equivalent, ANTLR, etc.).

    There is usually some sample code that you can just run bison on and get your desired C code that demonstrates this four function calculator:

    http://www.gnu.org/software/bison/manual/html_node/Infix-Calc.html

    Look at the generated code, and see that this is not as easy as it sounds. Also, the advantages of using a tool like Bison are 1) you learn something (especially if you read the Dragon book and learn about grammars), 2) you avoid NIH trying to reinvent the wheel. With a real parser-generator tool, you actually have a hope at scaling up later, showing other people you know that parsers are the domain of parsing tools.


    Update:

    People here have offered much sound advice. My only warning against skipping the parsing tools or just using the Shunting Yard algorithm or a hand rolled recursive decent parser is that little toy languages1 may someday turn into big actual languages with functions (sin, cos, log) and variables, conditions and for loops.

    Flex/Bison may very well be overkill for a small, simple interpreter, but a one off parser+evaluator may cause trouble down the line when changes need to be made or features need to be added. Your situation will vary and you will need to use your judgement; just don't punish other people for your sins [2] and build a less than adequate tool.

    My favorite tool for parsing

    The best tool in the world for the job is the Parsec library (for recursive decent parsers) which comes with the programming language Haskell. It looks a lot like BNF, or like some specialized tool or domain specific language for parsing (sample code [3]), but it is in fact just a regular library in Haskell, meaning that it compiles in the same build step as the rest of your Haskell code, and you can write arbitrary Haskell code and call that within your parser, and you can mix and match other libraries all in the same code. (Embedding a parsing language like this in a language other than Haskell results in loads of syntactic cruft, by the way. I did this in C# and it works quite well but it is not so pretty and succinct.)

    Notes:

    1 Richard Stallman says, in Why you should not use Tcl

    The principal lesson of Emacs is that a language for extensions should not be a mere "extension language". It should be a real programming language, designed for writing and maintaining substantial programs. Because people will want to do that!

    [2] Yes, I am forever scarred from using that "language".

    Also note that when I submitted this entry, the preview was correct, but SO's less than adequate parser ate my close anchor tag on the first paragraph, proving that parsers are not something to be trifled with because if you use regexes and one off hacks you will probably get something subtle and small wrong.

    [3] Snippet of a Haskell parser using Parsec: a four function calculator extended with exponents, parentheses, whitespace for multiplication, and constants (like pi and e).

    aexpr   =   expr `chainl1` toOp
    expr    =   optChainl1 term addop (toScalar 0)
    term    =   factor `chainl1` mulop
    factor  =   sexpr  `chainr1` powop
    sexpr   =   parens aexpr
            <|> scalar
            <|> ident
    
    powop   =   sym "^" >>= return . (B Pow)
            <|> sym "^-" >>= return . (\x y -> B Pow x (B Sub (toScalar 0) y))
    
    toOp    =   sym "->" >>= return . (B To)
    
    mulop   =   sym "*" >>= return . (B Mul)
            <|> sym "/" >>= return . (B Div)
            <|> sym "%" >>= return . (B Mod)
            <|>             return . (B Mul)
    
    addop   =   sym "+" >>= return . (B Add) 
            <|> sym "-" >>= return . (B Sub)
    
    scalar = number >>= return . toScalar
    
    ident  = literal >>= return . Lit
    
    parens p = do
                 lparen
                 result <- p
                 rparen
                 return result
    

提交回复
热议问题