How does XPath deal with XML namespaces?

前端 未结 2 935
梦谈多话
梦谈多话 2020-11-21 04:26

How does XPath deal with XML namespaces?

If I use

/IntuitResponse/QueryResponse/Bill/Id

to parse the XML document

相关标签:
2条回答
  • 2020-11-21 04:58

    I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this

     thes    WD prop links   items
     NOM     P7749   3925    3789
     AAT     P1014   21157   20224
    

    and the formulas in cols links and items are

    =IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
    =IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
    

    respectively. The SPARQL query happens not to have any spaces...

    I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.

    0 讨论(0)
  • 2020-11-21 05:08

    Defining namespaces in XPath (recommended)

    XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.

    It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.


    Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs:

    XSLT:

    <xsl:stylesheet version="1.0"
                    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    xmlns:i="http://schema.intuit.com/finance/v3">
       ...
    

    Perl (LibXML):

    my $xc = XML::LibXML::XPathContext->new($doc);
    $xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
    my @nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
    

    Python (lxml):

    from lxml import etree
    f = StringIO('<IntuitResponse>...</IntuitResponse>')
    doc = etree.parse(f)
    r = doc.xpath('/i:IntuitResponse/i:QueryResponse', 
                  namespaces={'i':'http://schema.intuit.com/finance/v3'})
    

    Python (ElementTree):

    namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
    root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
    

    Python (Scrapy):

    response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
    response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
    

    Java (SAX):

    NamespaceSupport support = new NamespaceSupport();
    support.pushContext();
    support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
    

    Java (XPath):

    xpath.setNamespaceContext(new NamespaceContext() {
        public String getNamespaceURI(String prefix) {
          switch (prefix) {
            case "i": return "http://schema.intuit.com/finance/v3";
            // ...
           }
        });
    
    • Remember to call DocumentBuilderFactory.setNamespaceAware(true).
    • See also: Java XPath: Queries with default namespace xmlns

    xmlstarlet:

    -N i="http://schema.intuit.com/finance/v3"
    

    JavaScript:

    See Implementing a User Defined Namespace Resolver:

    function nsResolver(prefix) {
      var ns = {
        'i' : 'http://schema.intuit.com/finance/v3'
      };
      return ns[prefix] || null;
    }
    document.evaluate( '/i:IntuitResponse/i:QueryResponse', 
                       document, nsResolver, XPathResult.ANY_TYPE, 
                       null );
    

    Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().

    PhP:

    Adapted from @Tomalak's answer using DOMDocument:

    $result = new DOMDocument();
    $result->loadXML($xml);
    
    $xpath = new DOMXpath($result);
    $xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
    
    $result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
    

    See also @IMSoP's canonical Q/A on PHP SimpleXML namespaces.

    C#:

    XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
    nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
    XmlNodeList nodes = el.SelectNodes(@"/i:IntuitResponse/i:QueryResponse", nsmgr);
    

    VBA:

    xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
    doc.setProperty "SelectionNamespaces", xmlNS  
    Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
    

    VB.NET:

    xmlDoc = New XmlDocument()
    xmlDoc.Load("file.xml")
    nsmgr = New XmlNamespaceManager(New XmlNameTable())
    nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
    nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
                                               nsmgr)
    

    Ruby (Nokogiri):

    puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
                    'i' => "http://schema.intuit.com/finance/v3")
    

    Note that Nokogiri supports removal of namespaces,

    doc.remove_namespaces!
    

    but see the below warnings discouraging the defeating of XML namespaces.


    Once you've declared a namespace prefix, your XPath can be written to use it:

    /i:IntuitResponse/i:QueryResponse
    

    Defeating namespaces in XPath (not recommended)

    An alternative is to write predicates that test against local-name():

    /*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']/@startPosition
    

    Or, in XPath 2.0:

    /*:IntuitResponse/*:QueryResponse/@startPosition
    

    Skirting namespaces in this manner works but is not recommended because it

    • Under-specifies the full element/attribute name.
    • Fails to differentiate between element/attribute names in different namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly1:

      /*[    namespace-uri()='http://schema.intuit.com/finance/v3' 
         and local-name()='IntuitResponse']
      /*[    namespace-uri()='http://schema.intuit.com/finance/v3' 
         and local-name()='QueryResponse']
      /@startPosition
      

      1Thanks to Daniel Haley for the namespace-uri() note.

    • Is excessively verbose.

    0 讨论(0)
提交回复
热议问题