Parsing stray text with Scrapy

问题

Any idea how to extract 'TEXT TO GRAB' from this piece of markup:

<span class="navigation_page">
    <span>
        <a itemprop="url" href="http://www.example.com">
            <span itemprop="title">LINK</span>
        </a>
    </span>
    <span class="navigation-pipe">&gt;</span>
    TEXT TO GRAB
</span>

回答1:

It's not an ideal solution but it should do the trick:

from scrapy import Selector

content="""
<span class="navigation_page">
    <span>
        <a itemprop="url" href="http://www.example.com">
            <span itemprop="title">LINK</span>
        </a>
    </span>
    <span class="navigation-pipe">&gt;</span>
    TEXT TO GRAB
</span>
"""
sel = Selector(text=content)
item = sel.css(".navigation_page::text")
print(item.extract()[-1].strip())

OR like this:

sel = Selector(text=content)
item = ''.join([' '.join(items.split()) for items in sel.css("span.navigation_page::text").extract()])
print(item)

Output:

TEXT TO GRAB

回答2:

Not ideal:

text_to_grab = response.xpath('//span[@class="navigation-pipe"]/following-sibling::text()[1]').extract_first()

来源：https://stackoverflow.com/questions/48919935/parsing-stray-text-with-scrapy

标签

python

web-scraping

scrapy

scrapy-spider

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!