How to use selenium along with scrapy to automate the process?

后端 未结 1 1036
日久生厌
日久生厌 2021-01-14 20:08

I came to know at one point you need to use webtoolkits like selenium to automate the scraping.

How I will be able to click the next button on google play store in

1条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-14 21:06

    I'd do something like that:

    from scrapy import CrawlSpider
    from selenium import webdriver
    import time
    
    class FooSpider(CrawlSpider):
        name = 'foo'
        allow_domains = 'foo.com'
        start_urls = ['foo.com']
    
        def __init__(self, *args, **kwargs):
            super(FooSpider, self).__init__(*args, **kwargs)
            self.download_delay = 0.25
            self.browser = webdriver.Firefox()
            self.browser.implicitly_wait(60)
    
        def parse_foo(self.response):
            self.browser.get(response.url)  # load response to the browser
            button = self.browser.find_element_by_xpath("path") # find 
            # the element to click to
            button.click() # click
            time.sleep(1) # wait until the page is fully loaded
            source = self.browser.page_source # get source of the loaded page
            sel = Selector(text=source) # create a Selector object
            data = sel.xpath('path/to/the/data') # select data
            ...
    

    It's better not to wait for a fixed amount of time, though. So instead of time.sleep(1), you can use one of the approaches described here http://www.obeythetestinggoat.com/how-to-get-selenium-to-wait-for-page-load-after-a-click.html.

    0 讨论(0)
提交回复
热议问题