This must be an absolute classic, but I can\'t find the answer here. I\'m parsing the following tag with lxml cssselect:
-
itertext
method of an element returns an iterator of node's text data. For your
tag, ' Detroit'
would be the 2nd value returned by the iterator. If structure of your document always conforms to a known specification, you could skip specific text elements to get what you need.
from lxml import html
doc = html.fromstring("""
- 3 Detroit
""")
stop_nodes = doc.cssselect('li a')
stop_names = []
for start in stop_list:
node_text = start.itertext()
node_text.next() # Skip '3'
stop_names.append(node_text.next().lstrip())
continue
You can combine css selector with the xpath text()
function mentioned in Zachary's answer like this (If you're more comfortable with using CSS selectors than xpath):
stop_names = [a.xpath('text()').lstrip() for a in doc.cssselect('li a')]
- 热议问题