What would be the best way in Python to parse out chunks of text contained in matching brackets?
\"{ { a } { b } { { { c } } } }\"
should i
Pseudocode:
For each string in the array:
Find the first '{'. If there is none, leave that string alone.
Init a counter to 0.
For each character in the string:
If you see a '{', increment the counter.
If you see a '}', decrement the counter.
If the counter reaches 0, break.
Here, if your counter is not 0, you have invalid input (unbalanced brackets)
If it is, then take the string from the first '{' up to the '}' that put the
counter at 0, and that is a new element in your array.
I'm kind of new to Python, so go easy on me, but here is an implementation that works:
def balanced_braces(args):
parts = []
for arg in args:
if '{' not in arg:
continue
chars = []
n = 0
for c in arg:
if c == '{':
if n > 0:
chars.append(c)
n += 1
elif c == '}':
n -= 1
if n > 0:
chars.append(c)
elif n == 0:
parts.append(''.join(chars).lstrip().rstrip())
chars = []
elif n > 0:
chars.append(c)
return parts
t1 = balanced_braces(["{{ a } { b } { { { c } } } }"]);
print t1
t2 = balanced_braces(t1)
print t2
t3 = balanced_braces(t2)
print t3
t4 = balanced_braces(t3)
print t4
Output:
['{ a } { b } { { { c } } }']
['a', 'b', '{ { c } }']
['{ c }']
['c']
Using Grako (grammar compiler):
#!/usr/bin/env python
import json
import grako # $ pip install grako
grammar_ebnf = """
bracketed = '{' @:( { bracketed }+ | any ) '}' ;
any = /[^{}]+?/ ;
"""
model = grako.genmodel("Bracketed", grammar_ebnf)
ast = model.parse("{ { a } { b } { { { c } } } }", "bracketed")
print(json.dumps(ast, indent=4))
[
"a",
"b",
[
[
"c"
]
]
]
Cleaner solution. This will find return the string enclosed in the outermost bracket. If None is returned, there was no match.
def findBrackets( aString ):
if '{' in aString:
match = aString.split('{',1)[1]
open = 1
for index in xrange(len(match)):
if match[index] in '{}':
open = (open + 1) if match[index] == '{' else (open - 1)
if not open:
return match[:index]
Or this pyparsing version:
>>> from pyparsing import nestedExpr
>>> txt = "{ { a } { b } { { { c } } } }"
>>>
>>> nestedExpr('{','}').parseString(txt).asList()
[[['a'], ['b'], [[['c']]]]]
>>>
Parse using lepl (installable via $ easy_install lepl
):
from lepl import Any, Delayed, Node, Space
expr = Delayed()
expr += '{' / (Any() | expr[1:,Space()[:]]) / '}' > Node
print expr.parse("{{a}{b}{{{c}}}}")[0]
Output:
Node +- '{' +- Node | +- '{' | +- 'a' | `- '}' +- Node | +- '{' | +- 'b' | `- '}' +- Node | +- '{' | +- Node | | +- '{' | | +- Node | | | +- '{' | | | +- 'c' | | | `- '}' | | `- '}' | `- '}' `- '}'