Parsing logical sentence very slow with pyparsing

前端未结

关注

 2  1813

滥情空心 2021-01-04 09:25

I try to use pyparsing to parse logical expressions such as these

x
FALSE
NOT x
(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)

(A=True OR NOT (


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   攒了一身酷
                                             
                
                
                (楼主)
            
              
              
                2021-01-04 10:06
              

            
            
                        
I put your code into a small program

from sys import argv
from pyparsing import *

def parsit(aexpr):
    identifier = Group(Word(alphas, alphanums + "_")  +  Optional("'"))
    num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
    operator = Regex(">=|<=|!=|>|<|=")
    operand = identifier |  num
    aexpr = operatorPrecedence(operand,
                               [('*',2,opAssoc.LEFT,),
                                ('+',2,opAssoc.LEFT,),
                                (operator,2,opAssoc.LEFT,)
                                ])

    op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
               (CaselessLiteral('and'),2,opAssoc.LEFT ,),
               (CaselessLiteral('or'), 2,opAssoc.LEFT ,),
               ('=>', 2,opAssoc.LEFT ,),
               ]
    sentence = operatorPrecedence(aexpr,op_prec)
    return sentence

def demo02(arg):
    sent = parsit(arg)
    print arg, ":", sent.parseString(arg)

def demo01():
    for arg in ["x", "FALSE", "NOT x",
                  "(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)",
                  "(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)",
                  "((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)"
                  ]:
        demo02(arg)


if len(argv) <= 1:
    demo01()
else:
    for arg in argv[1:]:
        demo02(arg)


and ran through cProfile

$ python -m cProfile pyparsetest.py 


You will find many parseImpl calls, but in the middle of the output there is

2906500/8   26.374    0.000   72.667    9.083 pyparsing.py:913(_parseNoCache)
212752/300    1.045    0.000   72.608    0.242 pyparsing.py:985(tryParse)


the 72.667 beeing the comulated time from 72 total.

Therefore I would venture the guess that "caching" would offer a good lever.

Just enabling http://pyparsing-public.wikispaces.com/FAQs did not help, thoug. I added the lines

import pyparsing
pyparsing.usePackrat = True


and the runtime was the same.

The Number-Regex also looks fine to me -- quite standard, I guess. For example replacing it with

#num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
num = Regex(r"8|1|10|100|5")


also did not help. There is no "empty match" in my simple variant, which I guessed might be an issue -- but it seems not.

Last try is to look at the result parser with:

....
sentence = operatorPrecedence(aexpr,op_prec)
print sentence 
return sentence
....


And... whow... long!

Well, and not using your first operatorPrecedence is a lot faster, but doesn't work anymore for arithmetics.

Thus, I would venture the guess that, yes, try to seperate the two kinds of expressions (boolean and arithmetic) more. Maybe that will improve it. I will look into it too, it interests me as well.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复