XPath - select text of selected child nodes

為{幸葍}努か 提交于 2019-12-23 05:28:21

问题


Given that I have a following xml:

<div id="Main">
    <div class="quote">
        This is a quote and I don't want this text
    </div> 
    <p>
        This is content.
    </p>
    <p>  
        This is also content and I want both of them
    </p>
</div>

Is there "a XPath" to help me select inner text of div#Main as a single node, but must exclude texts of any div.quote.

I just want the text: "This is content.This is also content and I want both of them"

Thanks in advance

Here is the code to test the XPath, I'm using .NET with HtmlAgilityPack but I believe the xPath should work with any languages

[Test]
public void TestSelectNode()
{
    // Arrange 
    var html = "<div id=\"Main\"><div class=\"quote\">This is a quote and I don't want this text</div><p>This is content.</p><p>This is also content and I want both of them</p></div>";
    var xPath = "//div/*[not(self::div and @class=\"quote\")]/text()";

    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    // Action
    var node = doc.DocumentNode.SelectSingleNode(xPath);

    // Assert
    Assert.AreEqual("This is content.This is also content and I want both of them", node.InnerText);
}

The test was failed obviously because the xPath is still not correct.

Test 'XPathExperiments/TestSelectNode' failed:
    Expected values to be equal.

    Expected Value : "This is content.This is also content and I want both of them"
    Actual Value   : "This is content."

回答1:


I don't think there is an XPath that will give you this as a single node, because the values you're trying to obtain aren't a single node. Is there a reason you can't do this?

StringBuilder sb = new StringBuilder();
// Action
var nodes = doc.DocumentNode.SelectNodes(xPath);
foreach(var node in nodes)
{
   sb.Append(node.InnerText);
}

// Assert
Assert.AreEqual("This is content.This is also content and I want both of them", 
                sb.ToString());



回答2:


You want the text of any child of the div who is not div with class quote:

div/*[not(self::div and @class="quote")]/text()



回答3:


There's no XPath that would give you a combined string value, because XPath selects node objects and only node objects, even if they're text nodes.

Seeing as you have <p> nodes in the <div> in question, I'd use

div[@id='Main']/p/text()

which produces a list of text nodes in <p> elements in a <div id="Main">. Iterating through these and concatenating text contents should be simple.



来源:https://stackoverflow.com/questions/14614318/xpath-select-text-of-selected-child-nodes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!