Is there a way to extract text along with text-links in Scrapy using CSS?

后端 未结 2 379
长发绾君心
长发绾君心 2021-01-28 17:28

I\'m brand new to Scrapy. I have learned how to use response.css() for reading specific aspects from a web page, and am avoiding learning the xpath system. It seems

2条回答
  •  孤街浪徒
    2021-01-28 18:13

    You can try to extract text with this expression:

    >>> txt = """

    My sentence has a link to google in it.

    """ >>> from scrapy import Selector >>> sel = Selector(text=txt) >>> sel.css('p ::text').extract() [u'My sentence has a ', u'link to google', u' in it.'] >>> ' '.join(sel.css('p ::text').extract()) u'My sentence has a link to google in it.'

    Or, for example, use w3lib.html library to clean html tags from your response. In this way:

    from w3lib.html import remove_tags
    with_tags = response.css("p").get()
    clean_text = remove_tags(with_tags)
    

    But first variant looks shorter and more readable.

提交回复
热议问题