I want to split a XML file into multiple files. My workstation is very limited to Eclipse Mars with Xalan 2.7.1.
I can also use Python, but never used it before.
There's an excellent tool http://xmlstar.sourceforge.net/docs.php which can do a lot with xml (however it's not pythonic).
Given you have a 1.xml
file with the data as above. And you need to split it to separate files with names NNN.xml with element /root/row.
Just call in shell:
$ for ((i=1; i<=`xmlstarlet sel -t -v 'count(/root/row)' 1.xml`; i++)); do \
echo '<?xml version="1.0" encoding="UTF-8"?><root>' > NAME.xml;
NAME=$(xmlstarlet sel -t -m '/root/row[position()='$i']' -v './NAME' 1.xml); \
xmlstarlet sel -t -m '/root/row[position()='$i']' -c . -n 1.xml >> $NAME.xml; \
echo '</root>' >> NAME.xml
done
Now you have a bunch of xml files like Joe.xml
Use Python ElementTree.
Create a file e.g. xmlsplitter.py. Add the code below (where file.xml is your xml file and assuming every row has a unique NAME element.).
import xml.etree.ElementTree as ET
context = ET.iterparse('file.xml', events=('end', ))
for event, elem in context:
if elem.tag == 'row':
title = elem.find('NAME').text
filename = format(title + ".xml")
with open(filename, 'wb') as f:
f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
f.write(ET.tostring(elem))
Run it with
python xmlsplitter.py
Or if the names are not unique:
import xml.etree.ElementTree as ET
context = ET.iterparse('file.xml', events=('end', ))
index = 0
for event, elem in context:
if elem.tag == 'row':
index += 1
filename = format(str(index) + ".xml")
with open(filename, 'wb') as f:
f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
f.write(ET.tostring(elem))
This is the code which works perfect.
import xml.etree.ElementTree as ET
context = ET.iterparse('filname.xml', events=('end', ))
for event, elem in context:
if elem.tag == 'row':
title = elem.find('NAME').text
filename = format(title + ".xml")
with open(filename, 'wb') as f:
f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
f.write("<root>\n")
f.write(ET.tostring(elem))
f.write("</root>")