How can I remove the whitespaces and line breaks in an XML string in Python 2.6? I tried the following packages:
etree: This snippet keeps the original whitespaces:<
A little clumsy solution without lxml :-)
data = """<root>
<head></head> <content></content>
</root>"""
data3 = []
data2 = data.split('\n')
for x in data2:
y = x.strip()
if y: data3.append(y)
data4 = ''.join(data3)
data5 = data4.replace(" ","").replace("> <","><")
print data5
Output: <root><head></head><content></content></root>
The only thing that bothers me about xml.dom.minidom's toprettyxml() is that it adds blank lines. I don't seem to get the split components, so I just wrote a simple function to remove the blank lines:
#!/usr/bin/env python
import xml.dom.minidom
# toprettyxml() without the blank lines
def prettyPrint(x):
for line in x.toprettyxml().split('\n'):
if not line.strip() == '':
print line
xml_string = "<monty>\n<example>something</example>\n<python>parrot</python>\n</monty>"
# parse XML
x = xml.dom.minidom.parseString(xml_string)
# clean
prettyPrint(x)
And this is what the code outputs:
<?xml version="1.0" ?>
<monty>
<example>something</example>
<python>parrot</python>
</monty>
If I use toprettyxml() by itself, i.e. print(toprettyxml(x)), it adds unnecessary blank lines:
<?xml version="1.0" ?>
<monty>
<example>something</example>
<python>parrot</python>
</monty>