Extracting Inner text from HTML BODY node with Html Agility Pack

前端 未结 2 1251
感情败类
感情败类 2021-01-17 14:59

Need a bit of help with HTML Agility Pack!

Basically I want to grab plain-text withing the body node of the HTML. So far I have tried this in vb.ne

相关标签:
2条回答
  • 2021-01-17 15:43

    Jeff's solution is ok if you haven't tables, because text located in the table is sticking like cell1cell2cell3. To prevent this issue use this code (C# example):

    var words = doc.DocumentNode?.SelectNodes("//body//text()")?.Select(x => x.InnerText);
    return words != null ? string.Join(" ", words) : String.Empty;
    
    0 讨论(0)
  • 2021-01-17 16:03

    How about:

    Return htmldoc.DocumentNode.SelectSingleNode("//body").InnerText
    
    0 讨论(0)
提交回复
热议问题