Creating a tree/deeply nested dict from an indented text file in python

后端 未结 4 1434
遥遥无期
遥遥无期 2020-12-05 12:17

Basically, I want to iterate through a file and put the contents of each line into a deeply nested dict, the structure of which is defined by the amount of whitespace at the

相关标签:
4条回答
  • 2020-12-05 12:22

    Here is an object oriented approach based on a composite structure of nested Node objects.

    Input:

    indented_text = \
    """
    apple
        colours
            red
            yellow
            green
        type
            granny smith
        price
            0.10
    """
    

    a Node class

    class Node:
        def __init__(self, indented_line):
            self.children = []
            self.level = len(indented_line) - len(indented_line.lstrip())
            self.text = indented_line.strip()
    
        def add_children(self, nodes):
            childlevel = nodes[0].level
            while nodes:
                node = nodes.pop(0)
                if node.level == childlevel: # add node as a child
                    self.children.append(node)
                elif node.level > childlevel: # add nodes as grandchildren of the last child
                    nodes.insert(0,node)
                    self.children[-1].add_children(nodes)
                elif node.level <= self.level: # this node is a sibling, no more children
                    nodes.insert(0,node)
                    return
    
        def as_dict(self):
            if len(self.children) > 1:
                return {self.text: [node.as_dict() for node in self.children]}
            elif len(self.children) == 1:
                return {self.text: self.children[0].as_dict()}
            else:
                return self.text
    

    To parse the text, first create a root node. Then, remove empty lines from the text, and create a Node instance for every line, pass this to the add_children method of the root node.

    root = Node('root')
    root.add_children([Node(line) for line in indented_text.splitlines() if line.strip()])
    d = root.as_dict()['root']
    print(d)
    

    result:

    {'apple': [
      {'colours': ['red', 'yellow', 'green']},
      {'type': 'granny smith'},
      {'price': '0.10'}]
    }
    

    I think that it should be possible to do it in one step, where you simply call the constructor of Node once, with the indented text as an argument.

    0 讨论(0)
  • 2020-12-05 12:23

    The following code will take a block-indented file and convert into an XML tree; this:

    foo
    bar
    baz
      ban
      bal
    

    ...becomes:

    <cmd>foo</cmd>
    <cmd>bar</cmd>
    <block>
      <name>baz</name>
      <cmd>ban</cmd>
      <cmd>bal</cmd>
    </block>
    

    The basic technique is:

    1. Set indent to 0
    2. For each line, get the indent
    3. If > current, step down and save current block/ident on a stack
    4. If == current, append to current block
    5. If < current, pop from the stack until you get to the matching indent

    So:

    from lxml import builder
    C = builder.ElementMaker()
    
    def indent(line):
        strip = line.lstrip()
        return len(line) - len(strip), strip
    
    def parse_blockcfg(data):
        top = current_block = C.config()
        stack = []
        current_indent = 0
    
        lines = data.split('\n')
        while lines:
            line = lines.pop(0)
            i, line = indent(line)
    
            if i==current_indent:
                pass
    
            elif i > current_indent:
                # we've gone down a level, convert the <cmd> to a block
                # and then save the current ident and block to the stack
                prev.tag = 'block'
                prev.append(C.name(prev.text))
                prev.text = None
                stack.insert(0, (current_indent, current_block))
                current_indent = i
                current_block = prev
    
            elif i < current_indent:
                # we've gone up one or more levels, pop the stack
                # until we find out which level and return to it
                found = False
                while stack:
                    parent_indent, parent_block = stack.pop(0)
                    if parent_indent==i:
                        found = True
                        break
                if not found:
                    raise Exception('indent not found in parent stack')
                current_indent = i
                current_block = parent_block
    
            prev = C.cmd(line)
            current_block.append(prev)
    
        return top
    
    0 讨论(0)
  • 2020-12-05 12:34

    Here is a recursive solution. First, transform the input in the following way.

    Input:

    person:
        address:
            street1: 123 Bar St
            street2: 
            city: Madison
            state: WI
            zip: 55555
        web:
            email: boo@baz.com
    

    First-step output:

    [{'name':'person','value':'','level':0},
     {'name':'address','value':'','level':1},
     {'name':'street1','value':'123 Bar St','level':2},
     {'name':'street2','value':'','level':2},
     {'name':'city','value':'Madison','level':2},
     {'name':'state','value':'WI','level':2},
     {'name':'zip','value':55555,'level':2},
     {'name':'web','value':'','level':1},
     {'name':'email','value':'boo@baz.com','level':2}]
    

    This is easy to accomplish with split(':') and by counting the number of leading tabs:

    def tab_level(astr):
        """Count number of leading tabs in a string
        """
        return len(astr)- len(astr.lstrip('\t'))
    

    Then feed the first-step output into the following function:

    def ttree_to_json(ttree,level=0):
        result = {}
        for i in range(0,len(ttree)):
            cn = ttree[i]
            try:
                nn  = ttree[i+1]
            except:
                nn = {'level':-1}
    
            # Edge cases
            if cn['level']>level:
                continue
            if cn['level']<level:
                return result
    
            # Recursion
            if nn['level']==level:
                dict_insert_or_append(result,cn['name'],cn['value'])
            elif nn['level']>level:
                rr = ttree_to_json(ttree[i+1:], level=nn['level'])
                dict_insert_or_append(result,cn['name'],rr)
            else:
                dict_insert_or_append(result,cn['name'],cn['value'])
                return result
        return result
    

    where:

    def dict_insert_or_append(adict,key,val):
        """Insert a value in dict at key if one does not exist
        Otherwise, convert value to list and append
        """
        if key in adict:
            if type(adict[key]) != list:
                adict[key] = [adict[key]]
            adict[key].append(val)
        else:
            adict[key] = val
    
    0 讨论(0)
  • 2020-12-05 12:35

    First of all, don't use array and dict as variable names because they're reserved words in Python and reusing them may end up in all sorts of chaos.

    OK so if I get you correctly, you have a tree given in a text file, with parenthood indicated by indentations, and you want to recover the actual tree structure. Right?

    Does the following look like a valid outline? Because I have trouble putting your current code into context.

    result = {}
    last_indentation = 0
    for l in f.xreadlines():
       (c, i) = parse(l) # create parse to return character and indentation
       if i==last_indentation:
       # sibling to last
       elif i>last_indentation:
       # child to last
       else:
       # end of children, back to a higher level
    

    OK then your list are the current parents, that's in fact right - but I'd keep them pointed to the dictionary you've created, not the literal letter

    just starting some stuff here

    result = {}
    parents = {}
    last_indentation = 1 # start with 1 so 0 is the root of tree
    parents[0] = result
    for l in f.xreadlines():
       (c, i) = parse(l) # create parse to return character and indentation
       if i==last_indentation:
           new_el = {}
           parents[i-1][c] = new_el
           parents[i] = new_el
       elif i>last_indentation:
       # child to last
       else:
       # end of children, back to a higher level
    
    0 讨论(0)
提交回复
热议问题