I came to know at one point you need to use webtoolkits like selenium to automate the scraping.
How I will be able to click the next button on google play store in
I'd do something like that:
from scrapy import CrawlSpider
from selenium import webdriver
import time
class FooSpider(CrawlSpider):
name = 'foo'
allow_domains = 'foo.com'
start_urls = ['foo.com']
def __init__(self, *args, **kwargs):
super(FooSpider, self).__init__(*args, **kwargs)
self.download_delay = 0.25
self.browser = webdriver.Firefox()
self.browser.implicitly_wait(60)
def parse_foo(self.response):
self.browser.get(response.url) # load response to the browser
button = self.browser.find_element_by_xpath("path") # find
# the element to click to
button.click() # click
time.sleep(1) # wait until the page is fully loaded
source = self.browser.page_source # get source of the loaded page
sel = Selector(text=source) # create a Selector object
data = sel.xpath('path/to/the/data') # select data
...
It's better not to wait for a fixed amount of time, though. So instead of time.sleep(1)
, you can use one of the approaches described here http://www.obeythetestinggoat.com/how-to-get-selenium-to-wait-for-page-load-after-a-click.html.