I am having issue communicating between selenium and scrapy object.
I am using selenium to login to some site, once I get that response I want to use scrape\'s funct
It's pretty straightforward, create a middleware with a webdriver and use process_request
to intercept the request, discard it and use the url it had to pass it to your selenium webdriver:
from scrapy.http import HtmlResponse
from selenium import webdriver
class DownloaderMiddleware(object):
def __init__(self):
self.driver = webdriver.Chrome() # your chosen driver
def process_request(self, request, spider):
# only process tagged request or delete this if you want all
if not request.meta.get('selenium'):
return
self.driver.get(request.url)
body = self.driver.page_source
response = HtmlResponse(url=self.driver.current_url, body=body)
return response
The downside of this is that you have to get rid of the concurrency in your spider since selenium webdrive can only handle one url at a time. For that see settings documentation page.