问题
I've got an HTML file with a <script>
in it:
<html>
<script type="application/custom+xml">
<my><xml><goes><here/></goes></xml></my>
</script>
</html>
I parse it with HTML Agility Pack and then convert it to XML.
HtmlDocument html;
html.OptionOutputAsXml = true;
html.Save(stream);
...
XDocument xml = XDocument.Load(stream);
I then want to use LINQ-to-XML to look at the contents of the script
tag which should contain my XML as CDATA. But HTML Agility Pack messes it up somehow and I end up with this escaped XML:
<html>
<script type="application/custom+xml">
//<![CDATA[
<my><xml><goes><here/></goes></xml></my>
//]]>//
</script>
</html>
Does anyone know how I can tell HTML Agility Pack not to escape the contents of the script
tag?
回答1:
That's rather easy, by default the AgilityPack is set to treat script tags content as CData, this is done in the static constructor of the HtmlNode class like so:
ElementsFlags.Add("script", HtmlElementFlag.CData);
To change this one doesn't have to modify the AgilityPack, all that's needed is one thing before your code, or just once when your program starts
HtmlNode.ElementsFlags.Remove("script");
Just add that before your code, like that it works for me.
来源:https://stackoverflow.com/questions/14159028/html-agility-pack-conversion-to-xml-script-corruption