问题
I have got really ghosty effect here. I try to replace an img node. and if I print out the document html once, nothing will happen. If I don't print out the document html, the img tag can be successfully replaced. It's really strange, can anyone explain?
my html code
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<div id="swap"></div>
</body>
</html>
and my c# code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.IO;
namespace htmlagile
{
class Program
{
static void Main(string[] args)
{
HtmlDocument htmldoc = new HtmlDocument();
string htmlstring;
using (StreamReader sr = new StreamReader("HTMLPage1.html"))
{
htmlstring = sr.ReadToEnd();
}
htmldoc.LoadHtml(htmlstring);
var div = htmldoc.DocumentNode.SelectNodes("//div");
Console.WriteLine(htmldoc.DocumentNode.OuterHtml);
foreach (var item in div)
{
HtmlNode newTag = htmldoc.CreateElement("p");
newTag.SetAttributeValue("id", "change");
item.ParentNode.ReplaceChild(newTag, item);
}
Console.WriteLine(htmldoc.DocumentNode.OuterHtml);
}
}
}
if I comment out my first console.WriteLine, the element can be successfully changed.
回答1:
It's a bug in the agility pack. They cache the OuterHtml and InnerHtml values. When a change happens, they only invalidate the immediate parent. Because you are printing the root, it still has the old cached value.
http://htmlagilitypack.codeplex.com/workitem/30053
If you change to printing out the parent div, you should see that the changes actually were performed:
Console.WriteLine(div.OuterHtml);
来源:https://stackoverflow.com/questions/15989191/ghosty-htmlagilitypack