I can easily remove the element just by note.Remove() lik this:
HtmlDocument html = new HtmlDocument();
html.Load(Server.MapPath(@\"~\\Site\\themes\\default
This is version in C# - answer for post from Dec 3 '14 at 17:57 - pseudocoder
The site did not allow me to comment and add to the original post. Maybe it will help someone.
private void removeNode(HtmlAgilityPack.HtmlNode node, bool keepChildren)
{
var parent = node.ParentNode;
if (keepChildren)
{
for ( int i = node.ChildNodes.Count - 1; i >= 0; i--)
{
parent.InsertAfter(node.ChildNodes[i], node);
}
}
node.Remove();
}
This should work:
foreach (var item in doc.DocumentNode.SelectNodes("//removeMe"))
{
if (item.PreviousSibling == null)
{
//First element -> so add it at beginning of the parent's innerhtml
item.ParentNode.InnerHtml = item.InnerHtml + item.ParentNode.InnerHtml;
}
else
{
//There is an element before itemToRemove -> add the innerhtml after the previous item
foreach(HtmlNode node in item.ChildNodes){
item.PreviousSibling.ParentNode.InsertAfter(node, item.PreviousSibling);
}
}
item.Remove();
}
There is a problem with the bool KeepGrandChildren implementation for people that might have text withing the element they are trying to remove. If the removeme tag had text in it, the text will be removed also. For example <removeme>text<p>more text</p></removeme>
will become <p>more text</p>
Try this:
private static void RemoveElementKeepText(HtmlNode node)
{
//node.ParentNode.RemoveChild(node, true);
HtmlNode parent = node.ParentNode;
HtmlNode prev = node.PreviousSibling;
HtmlNode next = node.NextSibling;
foreach (HtmlNode child in node.ChildNodes)
{
if (prev != null)
parent.InsertAfter(child, prev);
else if (next != null)
parent.InsertBefore(child, next);
else
parent.AppendChild(child);
}
node.Remove();
}
Normally the correct expression would be node.ParentNode.RemoveChildren(node, true)
.
Due to a ordering bug in HtmlNode.RemoveChildren()
(http://htmlagilitypack.codeplex.com/discussions/79587), I have created a method that is similar. Sorry it's in VB. If anyone wants a translation I'll write one.
'The HTML Agility Pack (1.4.9) includes the HtmlNode.RemoveChild() method but it has an ordering bug with preserving child nodes.
'The below implementation orders children correctly.
Private Shared Sub RemoveNode(node As HtmlAgilityPack.HtmlNode, keepChildren As Boolean)
Dim parent = node.ParentNode
If keepChildren Then
For i = node.ChildNodes.Count - 1 To 0 Step -1
parent.InsertAfter(node.ChildNodes(i), node)
Next
End If
node.Remove()
End Sub
I have tested this code with the following test markup:
<removeme>
outertextbegin
<p>innertext1</p>
<p>innertext2</p>
outertextend
</removeme>
The output is:
outertextbegin
<p>innertext1</p>
<p>innertext2</p>
outertextend
with regex you can do or you need to do with htmlagilitypack?
string html = "<ul><removeMe><li><a href="#">Keep me</a></li></removeMe></ul>";
html = Regex.Replace(html, "<removeMe.*?>", "", RegexOptions.Compiled);
html = Regex.Replace(html, "</removeMe>", "", RegexOptions.Compiled);
There is a simple way:
element.InnerHtml = element.InnerHtml.Replace("<br>", "{1}");
var innerTextWithBR = element.InnerText.Replace("{1}", "<br>");