HtmlAgility ParseErrors Property

女生的网名这么多〃 提交于 2020-01-06 04:33:32

问题


What errors can I expect to fix HtmlAgility library? I know from my own experience it can close a missing tag, like:

<car>Nissan</car

When do Load or LoadHtml, it will fix it, like:

<car>Nissan</car>

I also know that ParseErorrs collection can determine Reason, Stream etc.

Is there a list of errors (or can you tell from your own experience) how reliable is HtmlAgility for fixing errors and what errors cannot be fixed by HtmlAgility?


回答1:


Historically, Html Agility Pack was never designed to fix Html, but rather to be able to load, modify & save it back, even if this Html has errors.

It means it will fix errors that in general are fixed automatically by browsers, like the one you show in your question. The list of errors has been determined experimentally, and you can browse the source for a deep insight about it. That being said, it was actually designed back in 2000/2001 years so things may have changed in that area :-)

The ParseErrors collection will contain HtmlParseError objects with a code. The code is an enum that's documented:

    /// A tag was not closed.
    TagNotClosed,

    /// A tag was not opened.
    TagNotOpened,

    /// There is a charset mismatch between stream and declared (META) encoding.
    CharsetMismatch,

    /// An end tag was not required.
    EndTagNotRequired,

    /// An end tag is invalid at this position.
    EndTagInvalidHere

There is also an OptionFixNestedTags property on HtmlDocument (default value is false), that is capable of fixing LI, TR, TH, TD tags when nesting errors are detected. It means if it detects a closing TR without all the needed closing TD, they will be closed automatically. Again, this is exactly what browser will do with malformed Html.



来源:https://stackoverflow.com/questions/5364683/htmlagility-parseerrors-property

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!