Library to generate .NET XmlDocument from HTML tag soup

回眸只為那壹抹淺笑 提交于 2019-12-22 19:53:16

问题


I'm looking for a .NET library that can generate a clean Xml tree, ideally System.Xml.XmlDocument, from invalid HTML code. I.E. it should make the kind of best effort guesses, repairs, and substitutions browsers do when confronted with this situation, and generate a pretend XmlDocument. The library should also be well-maintained. :)

I realize this is a lot (too much?) to ask, and I would appreciate any useful leads. There seem to be a fair number of implementations of this for Java, but I would rather not generate my own bindings. So far for .NET I have found http://www.majestic12.co.uk/projects/html_parser.php and http://users.rcn.com/creitzel/tidy.html#dotnet, and http://sourceforge.net/projects/tidyfornet .

I have not yet built or tested any of these, but from the (sparse) docs and rare updates they do not seem like they have what I'm looking for. So what recommendations do you have, either among these choices, or from your past experience.


回答1:


The HTML Agility Pack is highly rated. It will certainly do the parsing / best guess etc.

The model is intentially similar to XmlDocument, including SelectNodes etc for querying.

If you need xhtml output, there is a OptionOutputAsXml flag; I assume that setting this to true and calling Save results in xhtml.



来源:https://stackoverflow.com/questions/704832/library-to-generate-net-xmldocument-from-html-tag-soup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!