scrapy: convert html string to HtmlResponse object

前端 未结 2 1861
时光取名叫无心
时光取名叫无心 2021-01-31 18:04

I have a raw html string that I want to convert to scrapy HTML response object so that I can use the selectors css and xpath, similar to scrapy\'s

相关标签:
2条回答
  • 2021-01-31 18:44

    First of all, if it is for debugging or testing purposes, you can use the Scrapy shell:

    $ cat index.html
    <div id="test">
        Test text
    </div>
    
    $ scrapy shell index.html
    >>> response.xpath('//div[@id="test"]/text()').extract()[0].strip()
    u'Test text'
    

    There are different objects available in the shell during the session, like response and request.


    Or, you can instantiate an HtmlResponse class and provide the HTML string in body:

    >>> from scrapy.http import HtmlResponse
    >>> response = HtmlResponse(url="my HTML string", body='<div id="test">Test text</div>', encoding='utf-8')
    >>> response.xpath('//div[@id="test"]/text()').extract()[0].strip()
    u'Test text'
    
    0 讨论(0)
  • 2021-01-31 18:51

    alecxe's answer is right, but this is the correct way to instantiate a Selector from text in scrapy:

    >>> from scrapy.selector import Selector
    >>> body = '<html><body><span>good</span></body></html>'
    >>> Selector(text=body).xpath('//span/text()').get()
    
    'good'
    
    0 讨论(0)
提交回复
热议问题