How to send JavaScript and Cookies Enabled in Scrapy?

后端未结

关注

 3  971

深忆病人 2021-02-08 19:57

I am scraping a website using Scrapy which require cooking and java-script to be enabled. I don\'t think I will have to actually process javascript. All I need is to pretend as

3条回答

礼貌的吻别 (楼主)

2021-02-08 20:10

You should try Splash JS engine with scrapyjs. Here is a example of how to set it up in your spider project:

SPLASH_URL = 'http://192.168.59.103:8050'
DOWNLOADER_MIDDLEWARES = {
    'scrapyjs.SplashMiddleware': 725,
}

Scraping hub which is the same company behind Scrapy, has special instances to run your spiders with splash enabled.

Then yield SplashRequest instead of Request in your spider like this:

import scrapy
from scrapy_splash import SplashRequest

class MySpider(scrapy.Spider):
    start_urls = ["http://example.com", "http://example.com/foo"]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse,
                endpoint='render.html',
                args={'wait': 0.5},
            )

    def parse(self, response):
        # response.body is a result of render.html call; it
        # contains HTML processed by a browser.
        # …

0 讨论(0)

查看其它3个回答