Is there a way to extract text along with text-links in Scrapy using CSS?

后端未结

关注

 2  382

长发绾君心 2021-01-28 17:28

I\'m brand new to Scrapy. I have learned how to use response.css() for reading specific aspects from a web page, and am avoiding learning the xpath system. It seems

2条回答

孤街浪徒 (楼主)

2021-01-28 18:13

You can try to extract text with this expression:

>>> txt = """My sentence has a link to google in it."""
>>> from scrapy import Selector
>>> sel = Selector(text=txt)
>>> sel.css('p ::text').extract()
[u'My sentence has a ', u'link to google', u' in it.']
>>> ' '.join(sel.css('p ::text').extract())
u'My sentence has a  link to google  in it.'

Or, for example, use w3lib.html library to clean html tags from your response. In this way:

from w3lib.html import remove_tags
with_tags = response.css("p").get()
clean_text = remove_tags(with_tags)

But first variant looks shorter and more readable.

0 讨论(0)

查看其它2个回答