I\'m trying to build a system to run a few dozen Scrapy spiders, save the results to S3, and let me know when it finishes. There are several similar questions on StackOverflow (
it eventually locks up and fails because it attempts to open too many file descriptors on the system that runs it
That's probably a sign that you need multiple machines to execute your spiders. A scalability issue. Well, you can also scale vertically to make your single machine more powerful but that would hit a "limit" much faster:
Check out the Distributed Crawling documentation and the scrapyd project.
There is also a cloud-based distributed crawling service called ScrapingHub which would take away the scalability problems from you altogether (note that I am not advertising them as I have no affiliation to the company).