How do I retrieve element text inside CDATA markup via XPath?

后端 未结 5 1115
南旧
南旧 2020-11-27 19:55

Consider the following xml fragment:


   

How do I retrieve the

相关标签:
5条回答
  • 2020-11-27 20:34

    CDATA sections are just part of what in XPath is known as a text node or in the XML Infoset as "chunks of character information items".

    Obviously, your tool is wrong. Other tools, as the XPath Visualizer correctly highlight the text of the Name element when evaluating this XPath expression:

    /*/Name/text()
    

    One can also write a simple XSLT transformation:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="text"/>
    <xsl:template match="/">
      "<xsl:value-of select="/*/Name"/>"
    </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the provided XML document:

    <Obj>
        <Name><![CDATA[SomeText]]></Name>
    </Obj>
    

    the correct result is produced:

      "SomeText"
    
    0 讨论(0)
  • 2020-11-27 20:41

    /Obj/Name/text() is the XPath to return the content of the CDATA markup.

    What threw me off was the behavior of the Value property. For an XMLNode (DOM world), the XmlNode.Value property of an Element (with CDATA or otherwise) returns Null. The InnerText property would give you the CDATA/Text content. If you use Xml.Linq, XElement.Value returns the CDATA content.

    string sXml = @"
    <object>
        <name><![CDATA[SomeText]]></name>
        <name>OtherName</name>
    </object>";
    
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.LoadXml( sXml );
    XmlNamespaceManager nsMgr = new XmlNamespaceManager(xmlDoc.NameTable);
    
    Console.WriteLine(@"XPath = /object/name" );
    WriteNodesToConsole(xmlDoc.SelectNodes("/object/name", nsMgr));
    
    Console.WriteLine(@"XPath = /object/name/text()" );
    WriteNodesToConsole( xmlDoc.SelectNodes("/object/name/text()", nsMgr) );
    
    Console.WriteLine(@"Xml.Linq = obRoot.Elements(""name"")");
    XElement obRoot = XElement.Parse( sXml );
    WriteNodesToConsole( obRoot.Elements("name") );
    

    Output:

    XPath = /object/name
            NodeType = Element
            Value = <null>
            OuterXml = <name><![CDATA[SomeText]]></name>
            InnerXml = <![CDATA[SomeText]]>
            InnerText = SomeText
    
            NodeType = Element
            Value = <null>
            OuterXml = <name>OtherName</name>
            InnerXml = OtherName
            InnerText = OtherName
    
    XPath = /object/name/text()
            NodeType = CDATA
            Value = SomeText
            OuterXml = <![CDATA[SomeText]]>
            InnerXml =
            InnerText = SomeText
    
            NodeType = Text
            Value = OtherName
            OuterXml = OtherName
            InnerXml =
            InnerText = OtherName
    
    Xml.Linq = obRoot.Elements("name")
            Value = SomeText
            Value = OtherName
    

    Turned out the author of Visual XPath had a TODO for the CDATA type of XmlNodes. A little code snippet and I have CDATA support now. alt text

    MainForm.cs

    private void Xml2Tree( TreeNode tNode, XmlNode xNode)
    {
       ...
       case XmlNodeType.CDATA:
          //MessageBox.Show("TODO: XmlNodeType.CDATA");
          // Gishu                    
          TreeNode cdataNode = new TreeNode("![CDATA[" + xNode.Value + "]]");
          cdataNode.ForeColor = Color.Blue;
          cdataNode.NodeFont = new Font("Tahoma", 12);
          tNode.Nodes.Add(cdataNode);
          //Gishu
          break;
    
    0 讨论(0)
  • 2020-11-27 20:41

    A suggestion would be to have another field of the md5 hash of the cdata. You can then use xpath to query based off the md5 with no issue

    <sites>
      <site>
        <name>Google</name>
        <url><![CDATA[http://www.google.com]]></url>
        <urlMD5>ed646a3334ca891fd3467db131372140</urlMD5>
      </site>
    </sites>
    

    Then you can search:

    /sites/site[urlMD5=ed646a3334ca891fd3467db131372140]
    
    0 讨论(0)
  • 2020-11-27 20:45

    i think the thread you referenced says that the CDATA markup itself is ignored by XPATH, not the text contained in the CDATA markup.

    my guess is that its an issue with the tool, the source code is available for download, maybe you can debug it...

    0 讨论(0)
  • 2020-11-27 20:53

    See if this helps - http://www.zrinity.com/xml/xpath/
    XPATH = /Obj/Name/text()

    0 讨论(0)
提交回复
热议问题