how to get the full contents of a node using xpath & lxml?

后端未结

关注

 2  2015

I am using lxml\'s xpath function to retrieve parts of a webpage. I am trying to get contents of a tag, which includes html tags of its own. If I u

相关标签:

2条回答

再見小時候

2021-01-13 02:32

I'm not sure I understand -- is this close to what you are looking for?

import lxml.etree as le
import cStringIO
content='''\
<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>
'''
doc=le.parse(cStringIO.StringIO(content))

xpath='//font[@face="verdana" and @color="#ffffff" and @size="2"]/child::*'
x=doc.xpath(xpath)
print(map(le.tostring,x))
# ['<a href="url">inside</a> something']

0 讨论(0)

有刺的猬

2021-01-13 02:42
Is there anyway to use a pure XPath query to get the contents of the <font> nodes, or even to force lxml to return a string of the contents from the .xpath() method, rather than an lxml object?

Note that I'm returning a list of many nodes from the XPath query so the solution needs to support that.

just to clarify... i want to return something something <a href="url">inside</a> something from something like...
```
<font face="verdana" color="#ffffff" size="2"><a
```
href="url">inside something
Short answer: No.

XPath doesn't work on "tags" but with nodes

The selected nodes are represented as instances of specific objects in the language that is hosting XPath.

In case you need the string representation of a particular node's markup, such objects typically support an outerXML property -- check the documentation of the hosting language (lxml in this case).

As @Robert-Rossney pointed out in his comment: lxml's tostring() method is equivalent to other environments' outerXml property.
0 讨论(0)
发布评论:

提交评论
- 加载中...