Scrapy Shell and Scrapy Splash

烂漫一生 提交于 2019-11-28 03:43:31

just wrap the url you want to shell to in splash http api.

So you would want something like:

scrapy shell 'http://localhost:8050/render.html?url=http://domain.com/page-with-javascript.html&timeout=10&wait=0.5'

where localhost:port is where your splash service is running
url is url you want to crawl and dont forget to urlquote it!
render.html is one of the possible http api endpoints, returns redered html page in this case
timeout time in seconds for timeout
wait time in seconds to wait for javascript to execute before reading/saving the html.

You can run scrapy shell without arguments inside a configured Scrapy project, then create req = scrapy_splash.SplashRequest(url, ...) and call fetch(req).

For the windows users, who use Docker Toolbox:

  1. Change the single inverted comma with double inverted comma for preventing the invalid hostname:http error.

  2. change the localhost to the docker ip address which is below the whale logo. for me it was 192.168.99.100.

Finally i got this:

scrapy shell "http://192.168.99.100:8050/render.html?url="https://samplewebsite.com/category/banking-insurance-financial-services/""

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!