Parsing logical sentence very slow with pyparsing

前端 未结 2 1811
滥情空心
滥情空心 2021-01-04 09:25

I try to use pyparsing to parse logical expressions such as these

x
FALSE
NOT x
(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)

(A=True OR NOT (         


        
相关标签:
2条回答
  • 2021-01-04 10:06

    I put your code into a small program

    from sys import argv
    from pyparsing import *
    
    def parsit(aexpr):
        identifier = Group(Word(alphas, alphanums + "_")  +  Optional("'"))
        num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
        operator = Regex(">=|<=|!=|>|<|=")
        operand = identifier |  num
        aexpr = operatorPrecedence(operand,
                                   [('*',2,opAssoc.LEFT,),
                                    ('+',2,opAssoc.LEFT,),
                                    (operator,2,opAssoc.LEFT,)
                                    ])
    
        op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
                   (CaselessLiteral('and'),2,opAssoc.LEFT ,),
                   (CaselessLiteral('or'), 2,opAssoc.LEFT ,),
                   ('=>', 2,opAssoc.LEFT ,),
                   ]
        sentence = operatorPrecedence(aexpr,op_prec)
        return sentence
    
    def demo02(arg):
        sent = parsit(arg)
        print arg, ":", sent.parseString(arg)
    
    def demo01():
        for arg in ["x", "FALSE", "NOT x",
                      "(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)",
                      "(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)",
                      "((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)"
                      ]:
            demo02(arg)
    
    
    if len(argv) <= 1:
        demo01()
    else:
        for arg in argv[1:]:
            demo02(arg)
    

    and ran through cProfile

    $ python -m cProfile pyparsetest.py 
    

    You will find many parseImpl calls, but in the middle of the output there is

    2906500/8   26.374    0.000   72.667    9.083 pyparsing.py:913(_parseNoCache)
    212752/300    1.045    0.000   72.608    0.242 pyparsing.py:985(tryParse)
    

    the 72.667 beeing the comulated time from 72 total.

    Therefore I would venture the guess that "caching" would offer a good lever.

    Just enabling http://pyparsing-public.wikispaces.com/FAQs did not help, thoug. I added the lines

    import pyparsing
    pyparsing.usePackrat = True
    

    and the runtime was the same.

    The Number-Regex also looks fine to me -- quite standard, I guess. For example replacing it with

    #num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
    num = Regex(r"8|1|10|100|5")
    

    also did not help. There is no "empty match" in my simple variant, which I guessed might be an issue -- but it seems not.

    Last try is to look at the result parser with:

    ....
    sentence = operatorPrecedence(aexpr,op_prec)
    print sentence 
    return sentence
    ....
    

    And... whow... long!

    Well, and not using your first operatorPrecedence is a lot faster, but doesn't work anymore for arithmetics.

    Thus, I would venture the guess that, yes, try to seperate the two kinds of expressions (boolean and arithmetic) more. Maybe that will improve it. I will look into it too, it interests me as well.

    0 讨论(0)
  • 2021-01-04 10:24

    I had the same problem. Found a solution here (parserElement.enablePackrat()): https://github.com/pyparsing/pyparsing

    The following code is now parsed instantly (vs 60 sec before)

    ParserElement.enablePackrat()
    
    integer  = Word(nums).setParseAction(lambda t:int(t[0]))('int')
    operand  = integer | variable('var')
    
    # Left precedence
    eq    = Literal("==")('eq')
    gt    = Literal(">")('gt')
    gtEq  = Literal(">=")('gtEq')
    lt    = Literal("<")('lt')
    ltEq  = Literal("<=")('ltEq')
    notEq = Literal("!=")('notEq')
    mult  = oneOf('* /')('mult')
    plus  = oneOf('+ -')('plus')
    
    _and  = oneOf('&& and')('and')
    _or   = oneOf('|| or')('or')
    
    # Right precedence
    sign     = oneOf('+ -')('sign')
    negation = Literal('!')('negation')
    
    # Operator groups per presedence
    right_op = negation | sign 
    
    # Highest precedence
    left_op_1 = mult 
    left_op_2 = plus 
    left_op_3 = gtEq | ltEq | lt | gt
    left_op_4 = eq   | notEq
    left_op_5 = _and
    left_op_6 = _or
    # Lowest precedence
    
    condition = operatorPrecedence( operand, [
         (right_op,   1, opAssoc.RIGHT),
         (left_op_1,  2, opAssoc.LEFT),
         (left_op_2,  2, opAssoc.LEFT),
         (left_op_3,  2, opAssoc.LEFT),
         (left_op_4,  2, opAssoc.LEFT),
         (left_op_5,  2, opAssoc.LEFT),
         (left_op_6,  2, opAssoc.LEFT)
        ]
    )('computation')
    
    0 讨论(0)
提交回复
热议问题