HTML Agility Pack Conversion to XML <script> corruption

99封情书 提交于 2020-01-06 06:32:33

问题


I've got an HTML file with a <script> in it:

<html>
   <script type="application/custom+xml">
   <my><xml><goes><here/></goes></xml></my>
   </script>
</html>

I parse it with HTML Agility Pack and then convert it to XML.

HtmlDocument html;
html.OptionOutputAsXml = true;
html.Save(stream);
...
XDocument xml = XDocument.Load(stream);

I then want to use LINQ-to-XML to look at the contents of the script tag which should contain my XML as CDATA. But HTML Agility Pack messes it up somehow and I end up with this escaped XML:

<html>
<script type="application/custom+xml">
//<![CDATA[
&lt;my&gt;&lt;xml&gt;&lt;goes&gt;&lt;here/&gt;&lt;/goes&gt;&lt;/xml&gt;&lt;/my&gt;
//]]>//
</script>
</html>

Does anyone know how I can tell HTML Agility Pack not to escape the contents of the script tag?


回答1:


That's rather easy, by default the AgilityPack is set to treat script tags content as CData, this is done in the static constructor of the HtmlNode class like so:

ElementsFlags.Add("script", HtmlElementFlag.CData);

To change this one doesn't have to modify the AgilityPack, all that's needed is one thing before your code, or just once when your program starts

HtmlNode.ElementsFlags.Remove("script");

Just add that before your code, like that it works for me.



来源:https://stackoverflow.com/questions/14159028/html-agility-pack-conversion-to-xml-script-corruption

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!