Evaluating a mathematical [removed]function) for a large number of input values fast

前端 未结 5 1615
你的背包
你的背包 2020-12-09 12:11

The following questions

  • Evaluating a mathematical expression in a string
  • Equation parsing in Python
  • Safe way to parse user-supplied
相关标签:
5条回答
  • 2020-12-09 12:17

    Since you asked about asteval, there is a way to use it and get faster results:

    aeval = Interpreter()
    time_start = time.time()
    expr = aeval.parse(userinput_function)
    for item in database_xy:
        aeval.symtable['x'] = item['x']
        item['y_aeval'] = aeval.run(expr)
    time_end = time.time()
    

    That is, you can first parse ("pre-compile") the user input function, and then insert each new value of x into the symbol table and the use Interpreter.run() to evaluate the compiled expression for that value. On your scale, I think this will get you close to 0.5 seconds.

    If you are willing to use numpy, a hybrid solution:

    aeval = Interpreter()
    time_start = time.time()
    expr = aeval.parse(userinput_function)
    x = numpy.array([item['x'] for item in database_xy])
    aeval.symtable['x'] = x
    y = aeval.run(expr)
    time_end = time.time()
    

    should be much faster, and comparable in run time to using numexpr.

    0 讨论(0)
  • 2020-12-09 12:31

    If you are passing a string to sympy.simplify (which is not recommended usage; it's recommended to use sympify explicitly), that's going to use sympy.sympify to convert it to a SymPy expression, which uses eval internally.

    0 讨论(0)
  • 2020-12-09 12:37

    I'm not a Python coder, so I can't supply Python code. But I think I can provide a simple scheme that miminizes your dependencies and still runs pretty fast.

    The key here is to build something which is a close to eval without being eval. So what you want to do is "compile" the user equation into something which can be evaluated fast. OP has shown a number of solutions.

    Here is another based on evaluating the equation as Reverse Polish.

    For the sake of discussion, assume that you can convert the equation into RPN (reverse polish notation). This means operands come before operators, e.g., for the user formula:

            sqrt(x**2 + y**2)
    

    you get RPN equivalent reading left to right:

              x 2 ** y 2 ** + sqrt
    

    In fact, we can treat "operands", (e.g., variables and constants) as operators that take zero operands. Now everying in RPN is an operator.

    If we treat each operator element as a token (assume a unique small integer written as "RPNelement" below for each) and store them in an array "RPN", we can evaluate such a formula using a pushdown stack pretty fast:

           stack = {};  // make the stack empty
           do i=1,len(RPN),1
              case RPN[i]:
                  "0":  push(stack,0);
                  "1": push(stack,1);
                  "+":  push(stack,pop(stack)+pop(stack));break;
                   "-": push(stack,pop(stack)-pop(stack));break;
                   "**": push(stack,power(pop(stack),pop(stack)));break;
                   "x": push(stack,x);break;
                   "y": push(stack,y);break;
                   "K1": push(stack,K1);break;
                    ... // as many K1s as you have typical constants in a formula
               endcase
           enddo
           answer=pop(stack);
    

    You can inline the operations for push and pop to speed it up bit. If the supplied RPN is well formed, this code is perfectly safe.

    Now, how to get the RPN? Answer: build a little recursive descent parser, whose actions append RPN operators to the RPN array. See my SO answer for how to build a recursive descent parser easily for typical equations.

    You'll have to organize to put the constants encountered in parsing into K1, K2, ... if they are not special, commonly occuring values (as I have shown for "0" and "1"; you can add more if helpful).

    This solution should be a few hundred lines at most, and has zero dependencies on other packages.

    (Python experts: feel free to edit the code to make it Pythonesque).

    0 讨论(0)
  • 2020-12-09 12:40

    CPython (and pypy) use a very simple stack language for executing functions, and it is fairly easy to write the bytecode yourself, using the ast module.

    import sys
    PY3 = sys.version_info.major > 2
    import ast
    from ast import parse
    import types
    from dis import opmap
    
    ops = {
        ast.Mult: opmap['BINARY_MULTIPLY'],
        ast.Add: opmap['BINARY_ADD'],
        ast.Sub: opmap['BINARY_SUBTRACT'],
        ast.Div: opmap['BINARY_TRUE_DIVIDE'],
        ast.Pow: opmap['BINARY_POWER'],
    }
    LOAD_CONST = opmap['LOAD_CONST']
    RETURN_VALUE = opmap['RETURN_VALUE']
    LOAD_FAST = opmap['LOAD_FAST']
    def process(consts, bytecode, p, stackSize=0):
        if isinstance(p, ast.Expr):
            return process(consts, bytecode, p.value, stackSize)
        if isinstance(p, ast.BinOp):
            szl = process(consts, bytecode, p.left, stackSize)
            szr = process(consts, bytecode, p.right, stackSize)
            if type(p.op) in ops:
                bytecode.append(ops[type(p.op)])
            else:
                print(p.op)
                raise Exception("unspported opcode")
            return max(szl, szr) + stackSize + 1
        if isinstance(p, ast.Num):
            if p.n not in consts:
                consts.append(p.n)
            idx = consts.index(p.n)
            bytecode.append(LOAD_CONST)
            bytecode.append(idx % 256)
            bytecode.append(idx // 256)
            return stackSize + 1
        if isinstance(p, ast.Name):
            bytecode.append(LOAD_FAST)
            bytecode.append(0)
            bytecode.append(0)
            return stackSize + 1
        raise Exception("unsupported token")
    
    def makefunction(inp):
        def f(x):
            pass
    
        if PY3:
            oldcode = f.__code__
            kwonly = oldcode.co_kwonlyargcount
        else:
            oldcode = f.func_code
        stack_size = 0
        consts = [None]
        bytecode = []
        p = ast.parse(inp).body[0]
        stack_size = process(consts, bytecode, p, stack_size)
        bytecode.append(RETURN_VALUE)
        bytecode = bytes(bytearray(bytecode))
        consts = tuple(consts)
        if PY3:
            code = types.CodeType(oldcode.co_argcount, oldcode.co_kwonlyargcount, oldcode.co_nlocals, stack_size, oldcode.co_flags, bytecode, consts, oldcode.co_names, oldcode.co_varnames, oldcode.co_filename, 'f', oldcode.co_firstlineno, b'')
            f.__code__ = code
        else:
            code = types.CodeType(oldcode.co_argcount, oldcode.co_nlocals, stack_size, oldcode.co_flags, bytecode, consts, oldcode.co_names, oldcode.co_varnames, oldcode.co_filename, 'f', oldcode.co_firstlineno, '')
            f.func_code = code
        return f
    

    This has the distinct advantage of generating essentially the same function as eval, and it scales almost exactly as well as compile+eval (the compile step is slightly slower than eval's, and eval will precompute anything it can (1+1+x gets compiled as 2+x).

    For comparison, eval finishes your 20k test in 0.0125 seconds, and makefunction finishes in 0.014 seconds. Increasing the number of iterations to 2,000,000, eval finishes in 1.23 seconds and makefunction finishes in 1.32 seconds.

    An interesting note, pypy recognizes that eval and makefunction produce essentially the same function, so the JIT warmup for the first accelerates the second.

    0 讨论(0)
  • 2020-12-09 12:42

    I have used the C++ ExprTK library in the past with great success. Here is a benchmark speed test amongst other C++ parsers (e.g. Muparser, MathExpr, ATMSP etc...) and ExprTK comes out on top.

    There is a Python wrapper to ExprTK called cexprtk which I have used and have found to be very fast. You are able to compile the mathematical expression just once and then evaluate this serialised expression as many times as required. Here is a simple example code using cexprtk with the userinput_function:

    import cexprtk
    import time
    
    userinput_function = '5*(1-(x*0.1))' # String - numbers should be handled as floats
    demo_len = 20000 # Parameter for benchmark (20k to 30k in real life)
    
    time_start = time.time()
    x = 1
    
    st = cexprtk.Symbol_Table({"x":x}, add_constants = True) # Setup the symbol table
    Expr = cexprtk.Expression(userinput_function, st) # Apply the symbol table to the userinput_function
    
    for x in range(0,demo_len,1):
        st.variables['x'] = x # Update the symbol table with the new x value
        Expr() # evaluate expression
    time_end = time.time()
    
    print('1 cexprtk: ' + str(round(time_end - time_start, 4)) + ' seconds')
    

    On my machine (Linux, dual core, 2.5GHz), for a demo length of 20000 this completes in 0.0202 seconds.

    For a demo length of 2,000,000 cexprtk finishes in 1.23 seconds.

    0 讨论(0)
提交回复
热议问题