I try to use pyparsing to parse logical expressions such as these
x
FALSE
NOT x
(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)
(A=True OR NOT (
I put your code into a small program
from sys import argv
from pyparsing import *
def parsit(aexpr):
identifier = Group(Word(alphas, alphanums + "_") + Optional("'"))
num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
operator = Regex(">=|<=|!=|>|<|=")
operand = identifier | num
aexpr = operatorPrecedence(operand,
[('*',2,opAssoc.LEFT,),
('+',2,opAssoc.LEFT,),
(operator,2,opAssoc.LEFT,)
])
op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
(CaselessLiteral('and'),2,opAssoc.LEFT ,),
(CaselessLiteral('or'), 2,opAssoc.LEFT ,),
('=>', 2,opAssoc.LEFT ,),
]
sentence = operatorPrecedence(aexpr,op_prec)
return sentence
def demo02(arg):
sent = parsit(arg)
print arg, ":", sent.parseString(arg)
def demo01():
for arg in ["x", "FALSE", "NOT x",
"(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)",
"(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)",
"((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)"
]:
demo02(arg)
if len(argv) <= 1:
demo01()
else:
for arg in argv[1:]:
demo02(arg)
and ran through cProfile
$ python -m cProfile pyparsetest.py
You will find many parseImpl
calls, but in the middle of the output there is
2906500/8 26.374 0.000 72.667 9.083 pyparsing.py:913(_parseNoCache)
212752/300 1.045 0.000 72.608 0.242 pyparsing.py:985(tryParse)
the 72.667
beeing the comulated time from 72
total.
Therefore I would venture the guess that "caching" would offer a good lever.
Just enabling http://pyparsing-public.wikispaces.com/FAQs did not help, thoug. I added the lines
import pyparsing
pyparsing.usePackrat = True
and the runtime was the same.
The Number-Regex also looks fine to me -- quite standard, I guess. For example replacing it with
#num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
num = Regex(r"8|1|10|100|5")
also did not help. There is no "empty match" in my simple variant, which I guessed might be an issue -- but it seems not.
Last try is to look at the result parser with:
....
sentence = operatorPrecedence(aexpr,op_prec)
print sentence
return sentence
....
And... whow... long!
Well, and not using your first operatorPrecedence
is a lot faster, but doesn't work anymore for arithmetics.
Thus, I would venture the guess that, yes, try to seperate the two kinds of expressions (boolean and arithmetic) more. Maybe that will improve it. I will look into it too, it interests me as well.