remove html node from htmldocument :HTMLAgilityPack

前端 未结 4 591
闹比i
闹比i 2020-12-16 18:17

In my code, I want to remove the img tag which doesn\'t have src value. I am using HTMLAgilitypack\'s HtmlDocument object. I am finding the img whic

相关标签:
4条回答
  • 2020-12-16 18:34

    What I have done is:

        List<string> xpaths = new List<string>();
        foreach (HtmlNode node in doc.DocumentNode.DescendantNodes())
        {
                            if (node.Name.ToLower() == "img")
                            {
                                string src = node.Attributes["src"].Value;
                                if (string.IsNullOrEmpty(src))
                                {
                                    xpaths.Add(node.XPath);
                                    continue;
                                }
                            }
        }
    
        foreach (string xpath in xpaths)
        {
                doc.DocumentNode.SelectSingleNode(xpath).Remove();
        }
    
    0 讨论(0)
  • 2020-12-16 18:36

    It seems you're modifying the collection during the enumeration by using HtmlNode.RemoveChild method.

    To fix this you need is to copy your nodes to a separate list/array by calling e.g. Enumerable.ToList<T>() or Enumerable.ToArray<T>().

    var nodesToRemove = doc.DocumentNode
        .SelectNodes("//img[not(string-length(normalize-space(@src)))]")
        .ToList();
    
    foreach (var node in nodesToRemove)
        node.Remove();
    

    If I'm right, the problem will disappear.

    0 讨论(0)
  • 2020-12-16 18:41
    var emptyElements = doc.DocumentNode
        .Descendants("a")
        .Where(x => x.Attributes["src"] == null || x.Attributes["src"].Value == String.Empty)
        .ToList();
    
    emptyElements.ForEach(node => {
        if (node != null){ node.Remove();}
    });
    
    0 讨论(0)
  • 2020-12-16 18:58
    var emptyImages = doc.DocumentNode
     .Descendants("img")
     .Where(x => x.Attributes["src"] == null || x.Attributes["src"].Value == String.Empty)
     .Select(x => x.XPath)
     .ToList(); 
    
    emptyImages.ForEach(xpath => { 
          var node = doc.DocumentNode.SelectSingleNode(xpath);
          if (node != null) { node.Remove(); }
        });
    
    0 讨论(0)
提交回复
热议问题