How can I break a string into nested tokens?

后端 未结 2 435
旧时难觅i
旧时难觅i 2021-01-24 11:32

I have strings made up of Boolean terms and equations, like so

x=1 AND (x=2 OR x=3) AND NOT (x=4 AND x=5) AND (x=5) AND y=1

I would like to break up

2条回答
  •  天涯浪人
    2021-01-24 12:25

    As you probably saw in that other question, parsing infix notation such as this is best done in pyparsing using the infixNotation helper (formerly named operatorPrecedence). Here are the basics for using infixNotation on your problem:

    import pyparsing as pp
    
    # define expressions for boolean operator keywords, and for an ident
    # (which we take care not to parse an operator as an identifier)
    AND, OR, NOT = map(pp.Keyword, "AND OR NOT".split())
    any_keyword = AND | OR | NOT
    ident = pp.ungroup(~any_keyword + pp.Char(pp.alphas))
    ident.setName("ident")
    
    # use pyparsing_common.number pre-defined expression for any numeric value
    numeric_value = pp.pyparsing_common.number
    
    # define an expression for 'x=1', 'y!=200', etc.
    comparison_op = pp.oneOf("= != < > <= >=")
    comparison = pp.Group(ident + comparison_op + numeric_value)
    comparison.setName("comparison")
    
    # define classes for the parsed results, where we can do further processing by
    # node type later
    class Node:
        oper = None
        def __init__(self, tokens):
            self.tokens = tokens[0]
    
        def __repr__(self):
            return "{}:{!r}".format(self.oper, self.tokens.asList())
    
    class UnaryNode(Node):
        def __init__(self, tokens):
            super().__init__(tokens)
            del self.tokens[0]
    
    class BinaryNode(Node):
        def __init__(self, tokens):
            super().__init__(tokens)
            del self.tokens[1::2]
    
    class NotNode(UnaryNode):
        oper = "NOT"
    
    class AndNode(BinaryNode):
        oper = "AND"
    
    class OrNode(BinaryNode):
        oper = "OR"
    
    # use infixNotation helper to define recursive expression parser,
    # including handling of nesting in parentheses
    expr = pp.infixNotation(comparison,
            [
                (NOT, 1, pp.opAssoc.RIGHT, NotNode),
                (AND, 2, pp.opAssoc.LEFT, AndNode),
                (OR, 2, pp.opAssoc.LEFT, OrNode),
            ])
    

    Now try using this expr parser on a test string.

    test = "x=1 AND (x=2 OR x=3 OR y=12) AND NOT (x=4 AND x=5) AND (x=6) AND y=7"
    
    try:
        result = expr.parseString(test, parseAll=True)
        print(test)
        print(result)
    except pp.ParseException as pe:
        print(pp.ParseException.explain(pe))
    

    Gives:

    x=1 AND (x=2 OR x=3 OR y=12) AND NOT (x=4 AND x=5) AND (x=6) AND y=7
    [AND:[['x', '=', 1], OR:[['x', '=', 2], ['x', '=', 3], ['y', '=', 12]], NOT:[AND:[['x', '=', 4], ['x', '=', 5]]], ['x', '=', 6], ['y', '=', 7]]]
    

    From this point, collapsing the nested AND nodes and removing non-x comparisons can be done using regular Python.

提交回复
热议问题