Equation (expression) parser with precedence?

前端 未结 23 1523
遇见更好的自我
遇见更好的自我 2020-11-22 11:44

I\'ve developed an equation parser using a simple stack algorithm that will handle binary (+, -, |, &, *, /, etc) operators, unary (!) operators, and parenthesis.

<
相关标签:
23条回答
  • 2020-11-22 12:31

    I wrote an expression parser in F# and blogged about it here. It uses the shunting yard algorithm, but instead of converting from infix to RPN, I added a second stack to accumulate the results of calculations. It correctly handles operator precedence, but doesn't support unary operators. I wrote this to learn F#, not to learn expression parsing, though.

    0 讨论(0)
  • 2020-11-22 12:33

    I'm currently working on a series of articles building a regular expression parser as a learning tool for design patterns and readable programing. You can take a look at readablecode. The article presents a clear use of shunting yards algorithm.

    0 讨论(0)
  • 2020-11-22 12:35

    I know this is a late answer, but I've just written a tiny parser that allows all operators (prefix, postfix and infix-left, infix-right and nonassociative) to have arbitrary precedence.

    I'm going to expand this for a language with arbitrary DSL support, but I just wanted to point out that one doesn't need custom parsers for operator precedence, one can use a generalized parser that doesn't need tables at all, and just looks up the precedence of each operator as it appears. People have been mentioning custom Pratt parsers or shunting yard parsers that can accept illegal inputs - this one doesn't need to be customized and (unless there's a bug) won't accept bad input. It isn't complete in a sense, it was written to test the algorithm and its input is in a form that will need some preprocessing, but there are comments that make it clear.

    Note some common kinds of operators are missing for instance the sort of operator used for indexing ie table[index] or calling a function function(parameter-expression, ...) I'm going to add those, but think of both as postfix operators where what comes between the delimeters '[' and ']' or '(' and ')' is parsed with a different instance of the expression parser. Sorry to have left that out, but the postfix part is in - adding the rest will probably almost double the size of the code.

    Since the parser is just 100 lines of racket code, perhaps I should just paste it here, I hope this isn't longer than stackoverflow allows.

    A few details on arbitrary decisions:

    If a low precedence postfix operator is competing for the same infix blocks as a low precedence prefix operator the prefix operator wins. This doesn't come up in most languages since most don't have low precedence postfix operators. - for instance: ((data a) (left 1 +) (pre 2 not)(data b)(post 3 !) (left 1 +) (data c)) is a+not b!+c where not is a prefix operator and ! is postfix operator and both have lower precedence than + so they want to group in incompatible ways either as (a+not b!)+c or as a+(not b!+c) in these cases the prefix operator always wins, so the second is the way it parses

    Nonassociative infix operators are really there so that you don't have to pretend that operators that return different types than they take make sense together, but without having different expression types for each it's a kludge. As such, in this algorithm, non-associative operators refuse to associate not just with themselves but with any operator with the same precedence. That's a common case as < <= == >= etc don't associate with each other in most languages.

    The question of how different kinds of operators (left, prefix etc) break ties on precedence is one that shouldn't come up, because it doesn't really make sense to give operators of different types the same precedence. This algorithm does something in those cases, but I'm not even bothering to figure out exactly what because such a grammar is a bad idea in the first place.

    #lang racket
    ;cool the algorithm fits in 100 lines!
    (define MIN-PREC -10000)
    ;format (pre prec name) (left prec name) (right prec name) (nonassoc prec name) (post prec name) (data name) (grouped exp)
    ;for example "not a*-7+5 < b*b or c >= 4"
    ;which groups as: not ((((a*(-7))+5) < (b*b)) or (c >= 4))"
    ;is represented as '((pre 0 not)(data a)(left 4 *)(pre 5 -)(data 7)(left 3 +)(data 5)(nonassoc 2 <)(data b)(left 4 *)(data b)(right 1 or)(data c)(nonassoc 2 >=)(data 4)) 
    ;higher numbers are higher precedence
    ;"(a+b)*c" is represented as ((grouped (data a)(left 3 +)(data b))(left 4 *)(data c))
    
    (struct prec-parse ([data-stack #:mutable #:auto]
                        [op-stack #:mutable #:auto])
      #:auto-value '())
    
    (define (pop-data stacks)
      (let [(data (car (prec-parse-data-stack stacks)))]
        (set-prec-parse-data-stack! stacks (cdr (prec-parse-data-stack stacks)))
        data))
    
    (define (pop-op stacks)
      (let [(op (car (prec-parse-op-stack stacks)))]
        (set-prec-parse-op-stack! stacks (cdr (prec-parse-op-stack stacks)))
        op))
    
    (define (push-data! stacks data)
        (set-prec-parse-data-stack! stacks (cons data (prec-parse-data-stack stacks))))
    
    (define (push-op! stacks op)
        (set-prec-parse-op-stack! stacks (cons op (prec-parse-op-stack stacks))))
    
    (define (process-prec min-prec stacks)
      (let [(op-stack (prec-parse-op-stack stacks))]
        (cond ((not (null? op-stack))
               (let [(op (car op-stack))]
                 (cond ((>= (cadr op) min-prec) 
                        (apply-op op stacks)
                        (set-prec-parse-op-stack! stacks (cdr op-stack))
                        (process-prec min-prec stacks))))))))
    
    (define (process-nonassoc min-prec stacks)
      (let [(op-stack (prec-parse-op-stack stacks))]
        (cond ((not (null? op-stack))
               (let [(op (car op-stack))]
                 (cond ((> (cadr op) min-prec) 
                        (apply-op op stacks)
                        (set-prec-parse-op-stack! stacks (cdr op-stack))
                        (process-nonassoc min-prec stacks))
                       ((= (cadr op) min-prec) (error "multiply applied non-associative operator"))
                       ))))))
    
    (define (apply-op op stacks)
      (let [(op-type (car op))]
        (cond ((eq? op-type 'post)
               (push-data! stacks `(,op ,(pop-data stacks) )))
              (else ;assume infix
               (let [(tos (pop-data stacks))]
                 (push-data! stacks `(,op ,(pop-data stacks) ,tos))))))) 
    
    (define (finish input min-prec stacks)
      (process-prec min-prec stacks)
      input
      )
    
    (define (post input min-prec stacks)
      (if (null? input) (finish input min-prec stacks)
          (let* [(cur (car input))
                 (input-type (car cur))]
            (cond ((eq? input-type 'post)
                   (cond ((< (cadr cur) min-prec)
                          (finish input min-prec stacks))
                         (else 
                          (process-prec (cadr cur)stacks)
                          (push-data! stacks (cons cur (list (pop-data stacks))))
                          (post (cdr input) min-prec stacks))))
                  (else (let [(handle-infix (lambda (proc-fn inc)
                                              (cond ((< (cadr cur) min-prec)
                                                     (finish input min-prec stacks))
                                                    (else 
                                                     (proc-fn (+ inc (cadr cur)) stacks)
                                                     (push-op! stacks cur)
                                                     (start (cdr input) min-prec stacks)))))]
                          (cond ((eq? input-type 'left) (handle-infix process-prec 0))
                                ((eq? input-type 'right) (handle-infix process-prec 1))
                                ((eq? input-type 'nonassoc) (handle-infix process-nonassoc 0))
                                (else error "post op, infix op or end of expression expected here"))))))))
    
    ;alters the stacks and returns the input
    (define (start input min-prec stacks)
      (if (null? input) (error "expression expected")
          (let* [(cur (car input))
                 (input-type (car cur))]
            (set! input (cdr input))
            ;pre could clearly work with new stacks, but could it reuse the current one?
            (cond ((eq? input-type 'pre)
                   (let [(new-stack (prec-parse))]
                     (set! input (start input (cadr cur) new-stack))
                     (push-data! stacks 
                                 (cons cur (list (pop-data new-stack))))
                     ;we might want to assert here that the cdr of the new stack is null
                     (post input min-prec stacks)))
                  ((eq? input-type 'data)
                   (push-data! stacks cur)
                   (post input min-prec stacks))
                  ((eq? input-type 'grouped)
                   (let [(new-stack (prec-parse))]
                     (start (cdr cur) MIN-PREC new-stack)
                     (push-data! stacks (pop-data new-stack)))
                   ;we might want to assert here that the cdr of the new stack is null
                   (post input min-prec stacks))
                  (else (error "bad input"))))))
    
    (define (op-parse input)
      (let [(stacks (prec-parse))]
        (start input MIN-PREC stacks)
        (pop-data stacks)))
    
    (define (main)
      (op-parse (read)))
    
    (main)
    
    0 讨论(0)
  • 2020-11-22 12:38

    Algorithm could be easily encoded in C as recursive descent parser.

    #include <stdio.h>
    #include <ctype.h>
    
    /*
     *  expression -> sum
     *  sum -> product | product "+" sum
     *  product -> term | term "*" product
     *  term -> number | expression
     *  number -> [0..9]+
     */
    
    typedef struct {
        int value;
        const char* context;
    } expression_t;
    
    expression_t expression(int value, const char* context) {
        return (expression_t) { value, context };
    }
    
    /* begin: parsers */
    
    expression_t eval_expression(const char* symbols);
    
    expression_t eval_number(const char* symbols) {
        // number -> [0..9]+
        double number = 0;        
        while (isdigit(*symbols)) {
            number = 10 * number + (*symbols - '0');
            symbols++;
        }
        return expression(number, symbols);
    }
    
    expression_t eval_term(const char* symbols) {
        // term -> number | expression
        expression_t number = eval_number(symbols);
        return number.context != symbols ? number : eval_expression(symbols);
    }
    
    expression_t eval_product(const char* symbols) {
        // product -> term | term "*" product
        expression_t term = eval_term(symbols);
        if (*term.context != '*')
            return term;
    
        expression_t product = eval_product(term.context + 1);
        return expression(term.value * product.value, product.context);
    }
    
    expression_t eval_sum(const char* symbols) {
        // sum -> product | product "+" sum
        expression_t product = eval_product(symbols);
        if (*product.context != '+')
            return product;
    
        expression_t sum = eval_sum(product.context + 1);
        return expression(product.value + sum.value, sum.context);
    }
    
    expression_t eval_expression(const char* symbols) {
        // expression -> sum
        return eval_sum(symbols);
    }
    
    /* end: parsers */
    
    int main() {
        const char* expression = "1+11*5";
        printf("eval(\"%s\") == %d\n", expression, eval_expression(expression).value);
    
        return 0;
    }
    

    next libs might be useful: yupana - strictly arithmetic operations; tinyexpr - arithmetic operations + C math functions + one provided by user; mpc - parser combinators

    Explanation

    Let's capture sequence of symbols that represent algebraic expression. First one is a number, that is a decimal digit repeated one or more times. We will refer such notation as production rule.

    number -> [0..9]+
    

    Addition operator with its operands is another rule. It is either number or any symbols that represents sum "*" sum sequence.

    sum -> number | sum "+" sum
    

    Try substitute number into sum "+" sum that will be number "+" number which in turn could be expanded into [0..9]+ "+" [0..9]+ that finally could be reduced to 1+8 which is correct addition expression.

    Other substitutions will also produce correct expression: sum "+" sum -> number "+" sum -> number "+" sum "+" sum -> number "+" sum "+" number -> number "+" number "+" number -> 12+3+5

    Bit by bit we could resemble set of production rules aka grammar that express all possible algebraic expression.

    expression -> sum
    sum -> difference | difference "+" sum
    difference -> product | difference "-" product
    product -> fraction | fraction "*" product
    fraction -> term | fraction "/" term
    term -> "(" expression ")" | number
    number -> digit+                                                                    
    

    To control operator precedence alter position of its production rule against others. Look at grammar above and note that production rule for * is placed below + this will force product evaluate before sum. Implementation just combines pattern recognition with evaluation and thus closely mirrors production rules.

    expression_t eval_product(const char* symbols) {
        // product -> term | term "*" product
        expression_t term = eval_term(symbols);
        if (*term.context != '*')
            return term;
    
        expression_t product = eval_product(term.context + 1);
        return expression(term.value * product.value, product.context);
    }
    

    Here we eval term first and return it if there is no * character after it this is left choise in our production rule otherwise - evaluate symbols after and return term.value * product.value this is right choise in our production rule i.e. term "*" product

    0 讨论(0)
  • 2020-11-22 12:40

    Here is a simple case recursive solution written in Java. Note it does not handle negative numbers but you can do add that if you want to:

    public class ExpressionParser {
    
    public double eval(String exp){
        int bracketCounter = 0;
        int operatorIndex = -1;
    
        for(int i=0; i<exp.length(); i++){
            char c = exp.charAt(i);
            if(c == '(') bracketCounter++;
            else if(c == ')') bracketCounter--;
            else if((c == '+' || c == '-') && bracketCounter == 0){
                operatorIndex = i;
                break;
            }
            else if((c == '*' || c == '/') && bracketCounter == 0 && operatorIndex < 0){
                operatorIndex = i;
            }
        }
        if(operatorIndex < 0){
            exp = exp.trim();
            if(exp.charAt(0) == '(' && exp.charAt(exp.length()-1) == ')')
                return eval(exp.substring(1, exp.length()-1));
            else
                return Double.parseDouble(exp);
        }
        else{
            switch(exp.charAt(operatorIndex)){
                case '+':
                    return eval(exp.substring(0, operatorIndex)) + eval(exp.substring(operatorIndex+1));
                case '-':
                    return eval(exp.substring(0, operatorIndex)) - eval(exp.substring(operatorIndex+1));
                case '*':
                    return eval(exp.substring(0, operatorIndex)) * eval(exp.substring(operatorIndex+1));
                case '/':
                    return eval(exp.substring(0, operatorIndex)) / eval(exp.substring(operatorIndex+1));
            }
        }
        return 0;
    }
    

    }

    0 讨论(0)
提交回复
热议问题