Creating a simple XML file using python

前端 未结 6 889
野性不改
野性不改 2020-11-22 14:55

What are my options if I want to create a simple XML file in python? (library wise)

The xml I want looks like:


 
     

        
相关标签:
6条回答
  • 2020-11-22 15:22

    Yattag http://www.yattag.org/ or https://github.com/leforestier/yattag provides an interesting API to create such XML document (and also HTML documents).

    It's using context manager and with keyword.

    from yattag import Doc, indent
    
    doc, tag, text = Doc().tagtext()
    
    with tag('root'):
        with tag('doc'):
            with tag('field1', name='blah'):
                text('some value1')
            with tag('field2', name='asdfasd'):
                text('some value2')
    
    result = indent(
        doc.getvalue(),
        indentation = ' '*4,
        newline = '\r\n'
    )
    
    print(result)
    

    so you will get:

    <root>
        <doc>
            <field1 name="blah">some value1</field1>
            <field2 name="asdfasd">some value2</field2>
        </doc>
    </root>
    
    0 讨论(0)
  • 2020-11-22 15:25

    The lxml library includes a very convenient syntax for XML generation, called the E-factory. Here's how I'd make the example you give:

    #!/usr/bin/python
    import lxml.etree
    import lxml.builder    
    
    E = lxml.builder.ElementMaker()
    ROOT = E.root
    DOC = E.doc
    FIELD1 = E.field1
    FIELD2 = E.field2
    
    the_doc = ROOT(
            DOC(
                FIELD1('some value1', name='blah'),
                FIELD2('some value2', name='asdfasd'),
                )   
            )   
    
    print lxml.etree.tostring(the_doc, pretty_print=True)
    

    Output:

    <root>
      <doc>
        <field1 name="blah">some value1</field1>
        <field2 name="asdfasd">some value2</field2>
      </doc>
    </root>
    

    It also supports adding to an already-made node, e.g. after the above you could say

    the_doc.append(FIELD2('another value again', name='hithere'))
    
    0 讨论(0)
  • 2020-11-22 15:27

    These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.

    The available options for that are:

    • ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
    • cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5)
    • LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)

    Here's an example of how to generate your example document using the in-stdlib cElementTree:

    import xml.etree.cElementTree as ET
    
    root = ET.Element("root")
    doc = ET.SubElement(root, "doc")
    
    ET.SubElement(doc, "field1", name="blah").text = "some value1"
    ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"
    
    tree = ET.ElementTree(root)
    tree.write("filename.xml")
    

    I've tested it and it works, but I'm assuming whitespace isn't significant. If you need "prettyprint" indentation, let me know and I'll look up how to do that. (It may be an LXML-specific option. I don't use the stdlib implementation much)

    For further reading, here are some useful links:

    • API docs for the implementation in the Python standard library
    • Introductory Tutorial (From the original author's site)
    • LXML etree tutorial. (With example code for loading the best available option from all major ElementTree implementations)

    As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you're in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:

    • LXML clearly wins for serializing (generating) XML
    • As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.
    0 讨论(0)
  • 2020-11-22 15:36

    I just finished writing an xml generator, using bigh_29's method of Templates ... it's a nice way of controlling what you output without too many Objects getting 'in the way'.

    As for the tag and value, I used two arrays, one which gave the tag name and position in the output xml and another which referenced a parameter file having the same list of tags. The parameter file, however, also has the position number in the corresponding input (csv) file where the data will be taken from. This way, if there's any changes to the position of the data coming in from the input file, the program doesn't change; it dynamically works out the data field position from the appropriate tag in the parameter file.

    0 讨论(0)
  • 2020-11-22 15:39

    For the simplest choice, I'd go with minidom: http://docs.python.org/library/xml.dom.minidom.html . It is built in to the python standard library and is straightforward to use in simple cases.

    Here's a pretty easy to follow tutorial: http://www.boddie.org.uk/python/XML_intro.html

    0 讨论(0)
  • 2020-11-22 15:43

    For such a simple XML structure, you may not want to involve a full blown XML module. Consider a string template for the simplest structures, or Jinja for something a little more complex. Jinja can handle looping over a list of data to produce the inner xml of your document list. That is a bit trickier with raw python string templates

    For a Jinja example, see my answer to a similar question.

    Here is an example of generating your xml with string templates.

    import string
    from xml.sax.saxutils import escape
    
    inner_template = string.Template('    <field${id} name="${name}">${value}</field${id}>')
    
    outer_template = string.Template("""<root>
     <doc>
    ${document_list}
     </doc>
    </root>
     """)
    
    data = [
        (1, 'foo', 'The value for the foo document'),
        (2, 'bar', 'The <value> for the <bar> document'),
    ]
    
    inner_contents = [inner_template.substitute(id=id, name=name, value=escape(value)) for (id, name, value) in data]
    result = outer_template.substitute(document_list='\n'.join(inner_contents))
    print result
    

    Output:

    <root>
     <doc>
        <field1 name="foo">The value for the foo document</field1>
        <field2 name="bar">The &lt;value&gt; for the &lt;bar&gt; document</field2>
     </doc>
    </root>
    

    The downer of the template approach is that you won't get escaping of < and > for free. I danced around that problem by pulling in a util from xml.sax

    0 讨论(0)
提交回复
热议问题