How to create <!DOCTYPE> with Python's cElementTree

后端 未结 4 1988
北海茫月
北海茫月 2020-11-30 14:42

I have tried to use the answer in this question, but can\'t make it work: How to create "virtual root" with Python's ElementTree?

Here\'s my code:

相关标签:
4条回答
  • 2020-11-30 15:22

    I couldn't find a solution to this problem either using vanilla ElementTree, and the solution proposed by demalexx created non-valid XML that was rejected by my application (DITA). What I propose is a workaround involving other modules and it works perfectly for me.

    import re
    # found no way for cleanly specify a <!DOCTYPE ...> stanza in ElementTree so
    # so we substitute the current <?xml ... ?> stanza with a full <?xml... + <!DOCTYPE...
    new_header = '<?xml version="1.0" encoding="UTF-8" ?>\n' \
                     '<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">\n'
    
    target_xml = re.sub(u"\<\?xml .+?>", new_header, source_xml)
    with open(filename, 'w') as catalog_file:
        catalog_file.write(target_xml.encode('utf8'))
    
    0 讨论(0)
  • 2020-11-30 15:29

    You could set xml_declaration argument on write function to False, so output won't have xml declaration with encoding, then just append what header you need manually. Actually if you set your encoding as 'utf-8' (lowercase), xml declaration won't be added too.

    import xml.etree.cElementTree as ElementTree
    
    tree = ElementTree.Element('tmx', {'version': '1.4a'})
    ElementTree.SubElement(tree, 'header', {'adminlang': 'EN'})
    ElementTree.SubElement(tree, 'body')
    
    with open('myfile.tmx', 'wb') as f:
        f.write('<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE tmx SYSTEM "tmx14a.dtd">'.encode('utf8'))
        ElementTree.ElementTree(tree).write(f, 'utf-8')
    

    Resulting file (newlines added manually for readability):

    <?xml version="1.0" encoding="UTF-8" ?>
    <!DOCTYPE tmx SYSTEM "tmx14a.dtd">
    <tmx version="1.4a">
        <header adminlang="EN" />
        <body />
    </tmx>
    
    0 讨论(0)
  • 2020-11-30 15:35

    I used different solution to add DOCTYPE, very simple, very stupid.

    import xml.etree.ElementTree as ET
    
    with open(path_file, "w", encoding='UTF-8') as xf:
        doc_type = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE dlg:window ' \
                   'PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "dialog.dtd">'
        tostring = ET.tostring(root).decode('utf-8')
        file = f"{doc_type}{tostring}"
        xf.write(file)
    
    0 讨论(0)
  • 2020-11-30 15:40

    You could use lxml and its tostring function:

    from lxml import etree
    
    s = """<?xml version="1.0" encoding="UTF-8"?>
    <tmx version="1.4a"/>""" 
    
    tree = etree.fromstring(s)
    header = etree.SubElement(tree,'header',{'adminlang': 'EN'})
    body = etree.SubElement(tree,'body')
    
    print etree.tostring(tree, encoding="UTF-8",
                         xml_declaration=True,
                         pretty_print=True,
                         doctype='<!DOCTYPE tmx SYSTEM "tmx14a.dtd">')
    

    =>

    <?xml version='1.0' encoding='UTF-8'?>
    <!DOCTYPE tmx SYSTEM "tmx14a.dtd">
    <tmx version="1.4a">
      <header adminlang="EN"/>
      <body/>
    </tmx>
    
    0 讨论(0)
提交回复
热议问题