how to get the full contents of a node using xpath & lxml?

后端 未结 2 2015
借酒劲吻你
借酒劲吻你 2021-01-13 01:46

I am using lxml\'s xpath function to retrieve parts of a webpage. I am trying to get contents of a tag, which includes html tags of its own. If I u

相关标签:
2条回答
  • 2021-01-13 02:32

    I'm not sure I understand -- is this close to what you are looking for?

    import lxml.etree as le
    import cStringIO
    content='''\
    <font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>
    '''
    doc=le.parse(cStringIO.StringIO(content))
    
    xpath='//font[@face="verdana" and @color="#ffffff" and @size="2"]/child::*'
    x=doc.xpath(xpath)
    print(map(le.tostring,x))
    # ['<a href="url">inside</a> something']
    
    0 讨论(0)
  • 2021-01-13 02:42

    Is there anyway to use a pure XPath query to get the contents of the <font> nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

    Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.

    just to clarify... i want to return something something <a href="url">inside</a> something from something like...

    <font face="verdana" color="#ffffff" size="2"><a
    

    href="url">inside something

    Short answer: No.

    XPath doesn't work on "tags" but with nodes

    The selected nodes are represented as instances of specific objects in the language that is hosting XPath.

    In case you need the string representation of a particular node's markup, such objects typically support an outerXML property -- check the documentation of the hosting language (lxml in this case).

    As @Robert-Rossney pointed out in his comment: lxml's tostring() method is equivalent to other environments' outerXml property.

    0 讨论(0)
提交回复
热议问题