Python threads stack_size and segfaults

自古美人都是妖i 提交于 2020-01-13 20:21:12

问题


A web crawler script that spawns at most 500 threads and each thread basically requests for certain data served from the remote server, which each server's reply is different in content and size from others.

i'm setting stack_size as 756K's for threads

threading.stack_size(756*1024)

which enables me to have the sufficient number of threads required and complete most of the jobs and requests. But as some servers' responses are bigger than others, and when a thread gets that kind of response, script dies with SIGSEGV.

stack_sizes more than 756K makes it impossible to have the required number of threads at the same time.

any suggestions on how can i continue with given stack_size without crashes? and how can i get the current used stack_size of any given thread?


回答1:


Why on earth are you spawning 500 threads? That seems like a terrible idea!

Remove threading completely, use an event loop to do the crawling. Your program will be faster, simpler, and easier to maintain.

Lots of threads waiting for network won't make your program wait faster. Instead, collect all open sockets in a list and run a loop where you check if any of them has data available.

I recommend using Twisted - It is an event-driven networking engine. It is very flexile, secure, scalable and very stable (no segfaults).

You could also take a look at Scrapy - It is a web crawling and screen scraping framework written in Python/Twisted. It is still under heavy development, but maybe you can take some ideas.



来源:https://stackoverflow.com/questions/394895/python-threads-stack-size-and-segfaults

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!