Editing XML as a dictionary in python?

前端 未结 8 1067
深忆病人
深忆病人 2020-12-31 18:11

I\'m trying to generate customized xml files from a template xml file in python.

Conceptually, I want to read in the template xml, remove some elements, change some

相关标签:
8条回答
  • 2020-12-31 18:16

    most direct way to me :

    root        = ET.parse(xh)
    data        = root.getroot()
    xdic        = {}
    if data > None:
        for part in data.getchildren():
            xdic[part.tag] = part.text
    
    0 讨论(0)
  • 2020-12-31 18:18

    For easy manipulation of XML in python, I like the Beautiful Soup library. It works something like this:

    Sample XML File:

    <root>
      <level1>leaf1</level1>
      <level2>leaf2</level2>
    </root>
    

    Python code:

    from BeautifulSoup import BeautifulStoneSoup, Tag, NavigableString
    
    soup = BeautifulStoneSoup('config-template.xml') # get the parser for the xml file
    soup.contents[0].name
    # u'root'
    

    You can use the node names as methods:

    soup.root.contents[0].name
    # u'level1'
    

    It is also possible to use regexes:

    import re
    tags_starting_with_level = soup.findAll(re.compile('^level'))
    for tag in tags_starting_with_level: print tag.name
    # level1
    # level2
    

    Adding and inserting new nodes is pretty straightforward:

    # build and insert a new level with a new leaf
    level3 = Tag(soup, 'level3')
    level3.insert(0, NavigableString('leaf3')
    soup.root.insert(2, level3)
    
    print soup.prettify()
    # <root>
    #  <level1>
    #   leaf1
    #  </level1>
    #  <level2>
    #   leaf2
    #  </level2>
    #  <level3>
    #   leaf3
    #  </level3>
    # </root>
    
    0 讨论(0)
  • 2020-12-31 18:21

    I'm not sure if converting the info set to nested dicts first is easier. Using ElementTree, you can do this:

    import xml.etree.ElementTree as ET
    doc = ET.parse("template.xml")
    lvl1 = doc.findall("level1-name")[0]
    lvl1.remove(lvl1.find("leaf1")
    lvl1.remove(lvl1.find("leaf2")
    # or use del lvl1[idx]
    doc.write("config-new.xml")
    

    ElementTree was designed so that you don't have to convert your XML trees to lists and attributes first, since it uses exactly that internally.

    It also support as small subset of XPath.

    0 讨论(0)
  • 2020-12-31 18:27

    Have you tried this?

    print xml.etree.ElementTree.tostring( conf_new )
    
    0 讨论(0)
  • 2020-12-31 18:31

    My modification of Daniel's answer, to give a marginally neater dictionary:

    def xml_to_dictionary(element):
        l = len(namespace)
        dictionary={}
        tag = element.tag[l:]
        if element.text:
            if (element.text == ' '):
                dictionary[tag] = {}
            else:
                dictionary[tag] = element.text
        children = element.getchildren()
        if children:
            subdictionary = {}
            for child in children:
                for k,v in xml_to_dictionary(child).items():
                    if k in subdictionary:
                        if ( isinstance(subdictionary[k], list)):
                            subdictionary[k].append(v)
                        else:
                            subdictionary[k] = [subdictionary[k], v]
                    else:
                        subdictionary[k] = v
            if (dictionary[tag] == {}):
                dictionary[tag] = subdictionary
            else:
                dictionary[tag] = [dictionary[tag], subdictionary]
        if element.attrib:
            attribs = {}
            for k,v in element.attrib.items():
                attribs[k] = v
            if (dictionary[tag] == {}):
                dictionary[tag] = attribs
            else:
                dictionary[tag] = [dictionary[tag], attribs]
        return dictionary
    

    namespace is the xmlns string, including braces, that ElementTree prepends to all tags, so here I've cleared it as there is one namespace for the entire document

    NB that I adjusted the raw xml too, so that 'empty' tags would produce at most a ' ' text property in the ElementTree representation

    spacepattern = re.compile(r'\s+')
    mydictionary = xml_to_dictionary(ElementTree.XML(spacepattern.sub(' ', content)))
    

    would give for instance

    {'note': {'to': 'Tove',
             'from': 'Jani',
             'heading': 'Reminder',
             'body': "Don't forget me this weekend!"}}
    

    it's designed for specific xml that is basically equivalent to json, should handle element attributes such as

    <elementName attributeName='attributeContent'>elementContent</elementName>
    

    too

    there's the possibility of merging the attribute dictionary / subtag dictionary similarly to how repeat subtags are merged, although nesting the lists seems kind of appropriate :-)

    0 讨论(0)
  • 2020-12-31 18:36

    This'll get you a dict minus attributes... dunno if this is useful to anyone. I was looking for an xml to dict solution myself when i came up with this.

    
    
    import xml.etree.ElementTree as etree
    
    tree = etree.parse('test.xml')
    root = tree.getroot()
    
    def xml_to_dict(el):
      d={}
      if el.text:
        d[el.tag] = el.text
      else:
        d[el.tag] = {}
      children = el.getchildren()
      if children:
        d[el.tag] = map(xml_to_dict, children)
      return d
    

    This: http://www.w3schools.com/XML/note.xml

    <note>
     <to>Tove</to>
     <from>Jani</from>
     <heading>Reminder</heading>
     <body>Don't forget me this weekend!</body>
    </note>
    

    Would equal this:

    
    {'note': [{'to': 'Tove'},
              {'from': 'Jani'},
              {'heading': 'Reminder'},
              {'body': "Don't forget me this weekend!"}]}
    
    0 讨论(0)
提交回复
热议问题