scrapy: convert html string to HtmlResponse object

前端 未结 2 1854
时光取名叫无心
时光取名叫无心 2021-01-31 18:04

I have a raw html string that I want to convert to scrapy HTML response object so that I can use the selectors css and xpath, similar to scrapy\'s

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-31 18:44

    First of all, if it is for debugging or testing purposes, you can use the Scrapy shell:

    $ cat index.html
    
    Test text
    $ scrapy shell index.html >>> response.xpath('//div[@id="test"]/text()').extract()[0].strip() u'Test text'

    There are different objects available in the shell during the session, like response and request.


    Or, you can instantiate an HtmlResponse class and provide the HTML string in body:

    >>> from scrapy.http import HtmlResponse
    >>> response = HtmlResponse(url="my HTML string", body='
    Test text
    ', encoding='utf-8') >>> response.xpath('//div[@id="test"]/text()').extract()[0].strip() u'Test text'

提交回复
热议问题