Splitting XML file into multiple at given tags

后端 未结 3 1552
孤城傲影
孤城傲影 2021-01-15 11:24

I want to split a XML file into multiple files. My workstation is very limited to Eclipse Mars with Xalan 2.7.1.

I can also use Python, but never used it before.

相关标签:
3条回答
  • 2021-01-15 11:58

    There's an excellent tool http://xmlstar.sourceforge.net/docs.php which can do a lot with xml (however it's not pythonic).

    Given you have a 1.xml file with the data as above. And you need to split it to separate files with names NNN.xml with element /root/row.

    Just call in shell:

        $ for ((i=1; i<=`xmlstarlet sel -t -v 'count(/root/row)'  1.xml`; i++)); do \
              echo '<?xml version="1.0" encoding="UTF-8"?><root>' > NAME.xml;
              NAME=$(xmlstarlet sel -t -m '/root/row[position()='$i']' -v './NAME' 1.xml); \
              xmlstarlet sel -t -m '/root/row[position()='$i']' -c . -n 1.xml >> $NAME.xml; \
              echo '</root>' >> NAME.xml
           done
    

    Now you have a bunch of xml files like Joe.xml

    0 讨论(0)
  • 2021-01-15 12:10

    Use Python ElementTree.

    Create a file e.g. xmlsplitter.py. Add the code below (where file.xml is your xml file and assuming every row has a unique NAME element.).

    import xml.etree.ElementTree as ET
    context = ET.iterparse('file.xml', events=('end', ))
    for event, elem in context:
        if elem.tag == 'row':
            title = elem.find('NAME').text
            filename = format(title + ".xml")
            with open(filename, 'wb') as f:
                f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
                f.write(ET.tostring(elem))
    

    Run it with

    python xmlsplitter.py
    

    Or if the names are not unique:

    import xml.etree.ElementTree as ET
    context = ET.iterparse('file.xml', events=('end', ))
    index = 0
    for event, elem in context:
        if elem.tag == 'row':
            index += 1
            filename = format(str(index) + ".xml")
            with open(filename, 'wb') as f:
                f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
                f.write(ET.tostring(elem))
    
    0 讨论(0)
  • 2021-01-15 12:23

    This is the code which works perfect.

    import xml.etree.ElementTree as ET
    
    context = ET.iterparse('filname.xml', events=('end', ))
    for event, elem in context:
    if elem.tag == 'row':
        title = elem.find('NAME').text
        filename = format(title + ".xml")
        with open(filename, 'wb') as f:
            f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
            f.write("<root>\n")
            f.write(ET.tostring(elem))
            f.write("</root>")
    
    0 讨论(0)
提交回复
热议问题