Python parsing bracketed blocks

后端 未结 9 1897
独厮守ぢ
独厮守ぢ 2020-11-27 04:44

What would be the best way in Python to parse out chunks of text contained in matching brackets?

\"{ { a } { b } { { { c } } } }\"

should i

相关标签:
9条回答
  • 2020-11-27 05:06

    Pseudocode:

    For each string in the array:
        Find the first '{'. If there is none, leave that string alone.
        Init a counter to 0. 
        For each character in the string:  
            If you see a '{', increment the counter.
            If you see a '}', decrement the counter.
            If the counter reaches 0, break.
        Here, if your counter is not 0, you have invalid input (unbalanced brackets)
        If it is, then take the string from the first '{' up to the '}' that put the
         counter at 0, and that is a new element in your array.
    
    0 讨论(0)
  • 2020-11-27 05:12

    I'm kind of new to Python, so go easy on me, but here is an implementation that works:

    def balanced_braces(args):
        parts = []
        for arg in args:
            if '{' not in arg:
                continue
            chars = []
            n = 0
            for c in arg:
                if c == '{':
                    if n > 0:
                        chars.append(c)
                    n += 1
                elif c == '}':
                    n -= 1
                    if n > 0:
                        chars.append(c)
                    elif n == 0:
                        parts.append(''.join(chars).lstrip().rstrip())
                        chars = []
                elif n > 0:
                    chars.append(c)
        return parts
    
    t1 = balanced_braces(["{{ a } { b } { { { c } } } }"]);
    print t1
    t2 = balanced_braces(t1)
    print t2
    t3 = balanced_braces(t2)
    print t3
    t4 = balanced_braces(t3)
    print t4
    

    Output:

    ['{ a } { b } { { { c } } }']
    ['a', 'b', '{ { c } }']
    ['{ c }']
    ['c']
    
    0 讨论(0)
  • 2020-11-27 05:15

    Using Grako (grammar compiler):

    #!/usr/bin/env python
    import json
    import grako # $ pip install grako
    
    grammar_ebnf = """
        bracketed = '{' @:( { bracketed }+ | any ) '}' ;
        any = /[^{}]+?/ ;
    """
    model = grako.genmodel("Bracketed", grammar_ebnf)
    ast = model.parse("{ { a } { b } { { { c } } } }", "bracketed")
    print(json.dumps(ast, indent=4))
    

    Output

    [
        "a", 
        "b", 
        [
            [
                "c"
            ]
        ]
    ]
    
    0 讨论(0)
  • 2020-11-27 05:16

    Cleaner solution. This will find return the string enclosed in the outermost bracket. If None is returned, there was no match.

    def findBrackets( aString ):
       if '{' in aString:
          match = aString.split('{',1)[1]
          open = 1
          for index in xrange(len(match)):
             if match[index] in '{}':
                open = (open + 1) if match[index] == '{' else (open - 1)
             if not open:
                return match[:index]
    
    0 讨论(0)
  • 2020-11-27 05:17

    Or this pyparsing version:

    >>> from pyparsing import nestedExpr
    >>> txt = "{ { a } { b } { { { c } } } }"
    >>>
    >>> nestedExpr('{','}').parseString(txt).asList()
    [[['a'], ['b'], [[['c']]]]]
    >>>
    
    0 讨论(0)
  • 2020-11-27 05:21

    Parse using lepl (installable via $ easy_install lepl):

    from lepl import Any, Delayed, Node, Space
    
    expr = Delayed()
    expr += '{' / (Any() | expr[1:,Space()[:]]) / '}' > Node
    
    print expr.parse("{{a}{b}{{{c}}}}")[0]
    

    Output:

    Node
     +- '{'
     +- Node
     |   +- '{'
     |   +- 'a'
     |   `- '}'
     +- Node
     |   +- '{'
     |   +- 'b'
     |   `- '}'
     +- Node
     |   +- '{'
     |   +- Node
     |   |   +- '{'
     |   |   +- Node
     |   |   |   +- '{'
     |   |   |   +- 'c'
     |   |   |   `- '}'
     |   |   `- '}'
     |   `- '}'
     `- '}'
    
    0 讨论(0)
提交回复
热议问题