How to search for content in XPath in multiline text using Python?

前端 未结 2 1725
感动是毒
感动是毒 2021-01-13 05:43

When I search for the existence of data in text() of an element using contains, it works for plain data but not when there are carriage returns, new lines/tags in the elemen

相关标签:
2条回答
  • 2021-01-13 06:23

    Use:

    //td[text()[contains(.,'Good bye')]]
    

    Explanation:

    The reason for the problem is not that a text node's string value is a multiline string -- the real reason is that the td element has more than one text-node children.

    In the provided expression:

    //td[contains(text(),"Good bye")]
    

    the first argument passed to the function contains() is a node-set of more than one text nodes.

    As per XPath 1.0 specification (in XPath 2.0 this simply raises a type error), a the evaluation of a function that expects a string argument but is passed a node-set instead, takes the string value only of the 1st node in the node-set.

    In this specific case, the first text node of the passed node-set has string value:

     "
                     Hello world "
    

    so the comparison fails and the wanted td element isn't selected.

    XSLT - based verification:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
    
     <xsl:template match="/">
      <xsl:copy-of select="//td[text()[contains(.,'Good bye')]]"/>
     </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the provided XML document:

    <table>
          <tr>
            <td>
              Hello world <i> how are you? </i>
              Have a wonderful day.
              Good bye!
            </td>
          </tr>
          <tr>
            <td>
              Hello NJ <i>, how are you?
              Have a wonderful day.</i>
            </td>
          </tr>
    </table>
    

    the XPath expression is evaluated and the selected nodes (in this case just one) are copied to the output:

    <td>
              Hello world <i> how are you? </i>
              Have a wonderful day.
              Good bye!
            </td>
    
    0 讨论(0)
  • 2021-01-13 06:31

    Use . instead of text():

    tdouthtml.xpath('//td[contains(.,"Good bye")]')
    
    0 讨论(0)
提交回复
热议问题