pyparsing nestedExpr and nested parentheses

前端 未结 1 951
孤城傲影
孤城傲影 2021-02-04 14:06

I am working on a very simple \"querying syntax\" usable by people with reasonable technical skills (i.e., not coders per se, but able to touch on the subject)

A typical

1条回答
  •  佛祖请我去吃肉
    2021-02-04 14:30

    nestedExpr is a convenience expression in pyparsing, to make it easy to define text with matched opening and closing characters. When you want to parse the nested contents, then nestedExpr is usually not well structured enough.

    The query syntax you are trying to parse is better served using pyparsing's infixNotation method. You can see several examples at the pyparsing wiki's Examples page - SimpleBool is is very similar to what you are parsing.

    "Infix notation" is a general parsing term for expressions where the operator is in between its related operands (vs. "postfix notation" where the operator follows the operands, as in "2 3 +" instead of "2 + 3"; or "prefix notation" which looks like "+ 2 3"). Operators can have an order of precedence in evaluation that can override left-to-right order - for instance, in "2 + 3 * 4", precedence of operations dictates that multiplication gets evaluated before addition. Infix notation also supports using parentheses or other grouping characters to override that precedence, as in "(2 + 3) * 4" to force the addition operation to be done first.

    pyparsing's infixNotation method takes a base operand expression, and then a list of operator definition tuples, in order of precedence. For instance, 4-function integer arithmetic would look like:

    parser = infixNotation(integer,
                 [
                 (oneOf('* /'), 2, opAssoc.LEFT),
                 (oneOf('+ -'), 2, opAssoc.LEFT),
                 ])
    

    Meaning that we will parse integer operands, with '*' and '/' binary left-associative operations and '+' and '-' binary operations, in that order. Support for parentheses to override the order is built into infixNotation.

    Query strings are often some combination of boolean operations NOT, AND, and OR, and typically evaluated in that order of precedence. In your case, the operands for these operators are comparison expressions, like "address = street" or "age between [20,30]". So if you define an expression for a comparison expression, of the form fieldname operator value, then you can use infixNotation to do the right grouping of AND's and OR's:

    import pyparsing as pp
    query_expr = pp.infixNotation(comparison_expr,
                    [
                        (NOT, 1, pp.opAssoc.RIGHT,),
                        (AND, 2, pp.opAssoc.LEFT,),
                        (OR, 2, pp.opAssoc.LEFT,),
                    ])
    

    Finally, I suggest you define a class to take the comparison tokens as class init args, then you can attach behavior to that class to evaluate the comparisons and output debug strings, something like:

    class ComparisonExpr:
        def __init__(self, tokens):
            self.tokens = tokens
    
        def __str__(self):
            return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
                                *self.tokens.asList())
    
    # attach the class to the comparison expression
    comparison_expr.addParseAction(ComparisonExpr)
    

    Then you can get output like:

    query_expr.parseString(sample).pprint()
    
    [[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
      'AND',
      Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
      'AND',
      [[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
        'AND',
        Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
       'OR',
       [Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
        'AND',
        Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
       'OR',
       [Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
        'AND',
        Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]
    

    The SimpleBool.py example has more details to show you how to create this class, and related classes for NOT, AND, and OR operators.

    EDIT:

    "Is there a way to return RESULT with dictionaries and not ComparisonExpr instances?" The __repr__ method on your ComparisonExpr class is being called instead of __str__. Easiest solution is to add to your class:

    __repr__ = __str__
    

    Or just rename __str__ to __repr__.

    "The only thing unknown left is for me to turn 'true' into True and '[20,30]' into [20, 30]"

    Try:

    CK = CaselessKeyword  # 'cause I'm lazy
    bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
    LBRACK,RBRACK = map(Suppress, "[]")
    # parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
    num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)
    

    Then add these to your VALUE expression:

    VALUE = bool_literal | num_list | Word(unicode_printables)
    

    Lastly:

    from pprint import pprint
    pprint(RESULT)
    

    I got so tired of importing pprint all the time to do just this, I just added it to the API for ParseResults. Try:

    RESULT.pprint()  # no import required on your part
    

    or

    print(RESULT.dump()) # will also show indented list of named fields
    

    EDIT2

    LASTLY, results names are good to learn. If you make this change to COMPARISON, everything still works as you have it:

    COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')
    

    But now you can write:

    def asDict(self):
        return self.tokens.asDict()
    

    And you can access the parsed values by name instead of index position (either using result['field'] notation or result.field notation).

    0 讨论(0)
提交回复
热议问题