how does scrapy-splash handle infinite scrolling?

前端 未结 2 1219
栀梦
栀梦 2020-12-09 13:25

I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?

相关标签:
2条回答
  • 2020-12-09 13:38

    Thanks Mikhail, I tried your scroll script, and it worked, but I also notice that your script scroll too much one time, some js have no time too render and is skipped, so I do some little change as follow:

    function main(splash)
            local num_scrolls = 10
            local scroll_delay = 1
    
            local scroll_to = splash:jsfunc("window.scrollTo")
            local get_body_height = splash:jsfunc(
                "function() {return document.body.scrollHeight;}"
            )
            assert(splash:go(splash.args.url))
            splash:wait(splash.args.wait)
    
            for _ = 1, num_scrolls do
                local height = get_body_height()
                for i = 1, 10 do
                    scroll_to(0, height * i/10)
                    splash:wait(scroll_delay/10)
                end
            end        
            return splash:html()
    end
    
    0 讨论(0)
  • 2020-12-09 13:56

    To scroll a page you can write a custom rendering script (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html), something like this:

    function main(splash)
        local num_scrolls = 10
        local scroll_delay = 1.0
    
        local scroll_to = splash:jsfunc("window.scrollTo")
        local get_body_height = splash:jsfunc(
            "function() {return document.body.scrollHeight;}"
        )
        assert(splash:go(splash.args.url))
        splash:wait(splash.args.wait)
    
        for _ = 1, num_scrolls do
            scroll_to(0, get_body_height())
            splash:wait(scroll_delay)
        end        
        return splash:html()
    end
    

    To render this script use 'execute' endpoint instead of render.html endpoint:

    script = """<Lua script> """
    scrapy_splash.SplashRequest(url, self.parse,
                                endpoint='execute', 
                                args={'wait':2, 'lua_source': script}, ...)
    
    0 讨论(0)
提交回复
热议问题