问题
I am using BS4 to parse an XML file and trying to write it back to a new XML file.
Input file:
<tag1>
<tag2 attr1="a1"> example text </tag2>
<tag3>
<tag4 attr2="a2"> example text </tag4>
<tag5>
<tag6 attr3="a3"> example text </tag6>
</tag5>
</tag3>
</tag1>
Script:
soup = BeautifulSoup(open("input.xml"), "xml")
f = open("output.xml", "w")
f.write(soup.encode(formatter='minimal'))
f.close()
Output:
<tag1>
<tag2 attr1="a1"> example text </tag2>
<tag3>
<tag4 attr2="a2"> example text </tag4>
<tag5>
<tag6 attr3="a3"> example text </tag6>
</tag5>
</tag3>
</tag1>
I want to retain the indentation of the input file. I tried using prettify option.
Output-Prettify:
<tag1>
<tag2 attr1="a1">
example text
</tag2>
<tag3>
<tag4 attr2="a2">
example text
</tag4>
<tag5>
<tag6 attr3="a3">
example text
</tag6>
</tag5>
</tag3>
</tag1>
But this is not what I wanted. I want to maintain the exact indentation of the tags as in the input file.
回答1:
Unfortunately you cannot to it directly. Beautiful soup parses its input and keeps no trace of the original formatting.
So, if do do not modify the XML, you could first read it as a whole string in memory, then feed that string into BS to parse it and make your tests, and then use it to write back to the new file.
If you want to modify the XML and use a special formatting, you will have to navigate the BS tree and format it by hand.
来源:https://stackoverflow.com/questions/29827087/maintaining-the-indentation-of-an-xml-file-when-parsed-with-beautifulsoup