Python Urllib UrlOpen Read

梦想与她 提交于 2019-12-06 04:36:08

There are a few things you can do. If the URLs are on different domains, then you might just fan out the work to threads, each downloading a page from a different domain.

If your URLs all point to the same server and you do not want stress the server, then you can just retrieve the URLs sequentially. If the server is happy with a couple of parallel requests, the you can look into pools of workers. You could start, say a pool of four workers and add all your URL to a queue, from which the workers will pull new URLs.

Since you tagged the question with "screen-scraping" as well, scrapy is a dedicated scraping framework, which can work in parallel.

Python 3 comes with a set of new builtin concurrency primitives under concurrent.futures.

Here is a caveat. I have encountered a number of servers powered by somewhat "elderly" releases of IIS. They often will not service a request if there is not a one second delay between requests.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!