How to use selenium along with scrapy to automate the process?

后端未结

关注

 1  1035

日久生厌

I came to know at one point you need to use webtoolkits like selenium to automate the scraping.

How I will be able to click the next button on google play store in

相关标签:

1条回答

野趣味

2021-01-14 21:06

I'd do something like that:

from scrapy import CrawlSpider
from selenium import webdriver
import time

class FooSpider(CrawlSpider):
    name = 'foo'
    allow_domains = 'foo.com'
    start_urls = ['foo.com']

    def __init__(self, *args, **kwargs):
        super(FooSpider, self).__init__(*args, **kwargs)
        self.download_delay = 0.25
        self.browser = webdriver.Firefox()
        self.browser.implicitly_wait(60)

    def parse_foo(self.response):
        self.browser.get(response.url)  # load response to the browser
        button = self.browser.find_element_by_xpath("path") # find 
        # the element to click to
        button.click() # click
        time.sleep(1) # wait until the page is fully loaded
        source = self.browser.page_source # get source of the loaded page
        sel = Selector(text=source) # create a Selector object
        data = sel.xpath('path/to/the/data') # select data
        ...

It's better not to wait for a fixed amount of time, though. So instead of time.sleep(1), you can use one of the approaches described here http://www.obeythetestinggoat.com/how-to-get-selenium-to-wait-for-page-load-after-a-click.html.

0 讨论(0)