Html Agility Pack - Remove element, but not innerHtml

后端 未结 10 1939
予麋鹿
予麋鹿 2021-01-06 11:46

I can easily remove the element just by note.Remove() lik this:

HtmlDocument html = new HtmlDocument();

html.Load(Server.MapPath(@\"~\\Site\\themes\\default         


        
相关标签:
10条回答
  • 2021-01-06 11:47

    This is version in C# - answer for post from Dec 3 '14 at 17:57 - pseudocoder

    The site did not allow me to comment and add to the original post. Maybe it will help someone.

    private void removeNode(HtmlAgilityPack.HtmlNode node, bool keepChildren)
    {
        var parent = node.ParentNode;
        if (keepChildren)
        {
            for ( int i = node.ChildNodes.Count - 1; i >= 0; i--)
            {
                parent.InsertAfter(node.ChildNodes[i], node);
            }            
        }
        node.Remove(); 
    }
    
    0 讨论(0)
  • 2021-01-06 11:49

    This should work:

    foreach (var item in doc.DocumentNode.SelectNodes("//removeMe"))
    {
        if (item.PreviousSibling == null)
        {
            //First element -> so add it at beginning of the parent's innerhtml
            item.ParentNode.InnerHtml = item.InnerHtml + item.ParentNode.InnerHtml;
        }
        else
        {
            //There is an element before itemToRemove -> add the innerhtml after the previous item
            foreach(HtmlNode node in item.ChildNodes){
                item.PreviousSibling.ParentNode.InsertAfter(node, item.PreviousSibling);
            }
        }
        item.Remove();
    }
    
    0 讨论(0)
  • 2021-01-06 11:51

    There is a problem with the bool KeepGrandChildren implementation for people that might have text withing the element they are trying to remove. If the removeme tag had text in it, the text will be removed also. For example <removeme>text<p>more text</p></removeme> will become <p>more text</p>

    Try this:

    private static void RemoveElementKeepText(HtmlNode node)
        {
            //node.ParentNode.RemoveChild(node, true);
            HtmlNode parent = node.ParentNode;
            HtmlNode prev = node.PreviousSibling;
            HtmlNode next = node.NextSibling;
    
            foreach (HtmlNode child in node.ChildNodes)
            {
                if (prev != null)
                    parent.InsertAfter(child, prev);
                else if (next != null)
                    parent.InsertBefore(child, next);
                else
                    parent.AppendChild(child);
    
            }
            node.Remove();
        }
    
    0 讨论(0)
  • 2021-01-06 11:55

    Normally the correct expression would be node.ParentNode.RemoveChildren(node, true).

    Due to a ordering bug in HtmlNode.RemoveChildren() (http://htmlagilitypack.codeplex.com/discussions/79587), I have created a method that is similar. Sorry it's in VB. If anyone wants a translation I'll write one.

    'The HTML Agility Pack (1.4.9) includes the HtmlNode.RemoveChild() method but it has an ordering bug with preserving child nodes.  
    'The below implementation orders children correctly.
    Private Shared Sub RemoveNode(node As HtmlAgilityPack.HtmlNode, keepChildren As Boolean)
        Dim parent = node.ParentNode
        If keepChildren Then
            For i = node.ChildNodes.Count - 1 To 0 Step -1
                parent.InsertAfter(node.ChildNodes(i), node)
            Next
        End If
        node.Remove()
    End Sub
    

    I have tested this code with the following test markup:

    <removeme>
        outertextbegin
        <p>innertext1</p>
        <p>innertext2</p>
        outertextend
    </removeme>
    

    The output is:

    outertextbegin
    <p>innertext1</p>
    <p>innertext2</p>
    outertextend
    
    0 讨论(0)
  • 2021-01-06 12:00

    with regex you can do or you need to do with htmlagilitypack?

    string html = "<ul><removeMe><li><a href="#">Keep me</a></li></removeMe></ul>";
    
    html = Regex.Replace(html, "<removeMe.*?>", "", RegexOptions.Compiled);
    html = Regex.Replace(html, "</removeMe>", "", RegexOptions.Compiled);
    
    0 讨论(0)
  • 2021-01-06 12:01

    There is a simple way:

     element.InnerHtml = element.InnerHtml.Replace("<br>", "{1}"); 
     var innerTextWithBR = element.InnerText.Replace("{1}", "<br>");
    
    0 讨论(0)
提交回复
热议问题