I have an XML file from a client that has greater than >
and less than <
signs in it and it fails an XML format check.
Is there a way to get
To answer your question plainly no, you cannot have an XML file with <
or >
in any of its value fields because the XML format uses these characters to signify the parent and child elements, e.g.
,
,
, etc.
Expanding on my answer: When a Python script writes <
or >
using the XML library, the library translates them to <
or >
, respectively. I don't believe this is possible with that library since it is actually filtering out the <
and >
characters as well as the Character Entity References. This makes sense - the XML library is preventing you from disrupting the syntax used for the parent xml.etree.cElementTree.Element
or any child xml.etree.cElementTree.SubElement
object fields. For example, use the code block in this great answer to experiment:
import xml.etree.cElementTree as ET
root = ET.Element("root")
doc = ET.SubElement(root, "doc")
ET.SubElement(doc, "field1", name="blah").text = "some "
ET.SubElement(doc, "field2", name="asdfasd").text = "some "
tree = ET.ElementTree(root)
tree.write("filename.xml")
This yields
.
Prettifying it:
some <value>
some <other value>
However, there's nothing stopping you from adding these characters manually: read in the XML file and re-write it, adding text, even if it contains <
or >
. If you want a proper XML file though, just be sure that these characters are only used within comment fields.
For your particular problem, you could read in the lines from the client's XML files, then either remove the <
and >
characters or, if the client requires them, move them to a commented portion of the line. Part of the challenge is that you have to leave in the
`, etc. portions of the file... This is challenging but it would be possible!
The following is what I'd expect the result to look like.
Tove
Jani
Reminder
Don't forget me this weekend!