HtmlAgilityPack produces missing closing tags in OuterHtml

爱⌒轻易说出口 提交于 2019-12-13 08:47:25

问题


I am using HtmlAgilityPack to parse and manipulate html text. However it seems the DocumentNode.OuterHtml gives missing closing tags.

To isolate the issue now I am doing nothing else just parse and get the OuterHtml (no manipulation):

var document = new HtmlDocument();
document.LoadHtml(myHtml);
result = document.DocumentNode.OuterHtml;

Original: (myHtml)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="X-UA-Compatible" content="IE=Edge" /><title>
     MyTitle
</title>

OutputHtml: (result) Notice that meta element is not closed

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="X-UA-Compatible" content="IE=Edge"><title>
    MyTitle
</title>

Similarly all input and img elements are leaved open. (Please do not answer that it should not be a problem. Well it should not be, but it is.) Chrome can not render the page correctly. Keep reading.

What is more weird:

Original: (myHtml)

    <option value="10">Afrikaans</option>
    <option value="11">Albanian</option>
    <option value="12">Arabic</option>
    <option value="13">Armenian</option>
    <option value="14">Azerbaijani</option>
    <option value="15">Basque</option>

OutputHtml: (result) Notice that that complete explicit closing tags are missing

    <option value="10">Afrikaans
    <option value="11">Albanian
    <option value="12">Arabic
    <option value="13">Armenian

Using HtmlAgilitPack latest NuGet package: id="HtmlAgilityPack" version="1.4.9"


回答1:


There are several options that you can set when you are loading the document.

OptionAutoCloseOnEnd

Defines if closing for non closed nodes must be done at the end or directly in the document. Setting this to true can actually change how browsers render the page.

document = new HtmlDocument();
document.OptionAutoCloseOnEnd = true;
document.LoadHtml(content);

Related sources worth reading:

HtmlAgilityPack Drops Option End Tags

Image tag not closing with HTMLAgilityPack



来源:https://stackoverflow.com/questions/35179687/htmlagilitypack-produces-missing-closing-tags-in-outerhtml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!