Some of the search queries entered under https://www.comparis.ch/carfinder/default would yield more than 1\'000 results (shown dynamically on the search page). The results howev
It seems that your website loads data when the client is browsing. There are probably a number of ways to fix this. One option could be to utilize Scrapy Splash.
Assuming you use scrapy, you can do the following:
settings.py
add SPLASH_URL = <splash-server-ip-address>
settings.py
add to middlewaresthis code:
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
from scrapy_splash import SplashRequest
in your spider.pystart_url
in your spider.py to iterate over the pagesE.g. like this
base_url = 'https://www.comparis.ch/carfinder/marktplatz/occasion'
start_urls = [
base_url + str('?page=') + str(page) % page for page in range(0,100)
]
def start_requests(self):
E.g. like this
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse,
endpoint='render.html',
args={'wait': 0.5},
)
Let me know how that works out for you.