How to write customize Downloader Middleware for selenium and Scrapy?

前端 未结 1 897
栀梦
栀梦 2021-01-14 16:17

I am having issue communicating between selenium and scrapy object.

I am using selenium to login to some site, once I get that response I want to use scrape\'s funct

1条回答
  •  北海茫月
    2021-01-14 16:36

    It's pretty straightforward, create a middleware with a webdriver and use process_request to intercept the request, discard it and use the url it had to pass it to your selenium webdriver:

    from scrapy.http import HtmlResponse
    from selenium import webdriver
    
    
    class DownloaderMiddleware(object):
        def __init__(self):
            self.driver = webdriver.Chrome()  # your chosen driver
    
        def process_request(self, request, spider):
            # only process tagged request or delete this if you want all
            if not request.meta.get('selenium'):
                return
            self.driver.get(request.url)
            body = self.driver.page_source
            response = HtmlResponse(url=self.driver.current_url, body=body)
            return response
    

    The downside of this is that you have to get rid of the concurrency in your spider since selenium webdrive can only handle one url at a time. For that see settings documentation page.

    0 讨论(0)
提交回复
热议问题