Suppose I have a bulleted list like this:
* list item 1
* list item 2 (a parent)
** list item 3 (a child of list item 2)
** list item 4 (a child of list item
I can't parse your desired result -- it seems to have more open parentheses than corresponding closed ones and I don't understand the logic behind it.
To make a tree structure explicit, what about, e.g.:
data = '''* list item 1
* list item 2
** list item 3
** list item 4
*** list item 5
* list item 6'''.splitlines()
class Node(object):
def __init__(self, payload):
self.payload = payload
self.children = []
def show(self, indent):
print ' '*indent, self.payload
for c in self.children:
c.show(indent+2)
def makenest(linelist):
rootnode = Node(None)
stack = [(rootnode, 0)]
for line in linelist:
for i, c in enumerate(line):
if c != '*': break
stars, payload = line[:i], line[i:].strip()
curlev = len(stars)
curnod = Node(payload)
while True:
parent, level = stack[-1]
if curlev > level: break
del stack[-1]
# a child node of the current top-of-stack
parent.children.append(curnod)
stack.append((curnod, curlev))
rootnode.show(0)
makenest(data)
The show
method of course exists just for the purpose of verifying that the part about parsing the strings and creating the tree has worked correctly. If you can specify more precisely exactly how it is that you want to transform your tree into nested tuples and lists, I'm sure it will be easy to add to class Node
the appropriate (and probably recursive) method -- so, could you please give this missing specification...?
Edit: since the OP has clarified now, it does, as predicted, become easy to satisfy the spec. Just add to class Node
the following method:
def emit(self):
if self.children:
return (self.payload,
[c.emit() for c in self.children])
else:
return (self.payload,)
and change the last three lines of the code (last one of makenest
, a blank one, and the module-level call to makenest
) to:
return [c.emit() for c in rootnode.children]
print(makenest(data))
(The parentheses after print
are redundant but innocuous in Python 2, required in Python 3, so I put them there just in case;-).
With these tiny changes, my code runs as requested, now emitting
[('list item 1',), ('list item 2', [('list item 3',), ('list item 4', [('list item 5',)])]), ('list item 6',)]