问题
I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting
Work-horse process was terminated unexpectedly (waitpid returned 11)
console log:
Moving job to 'failed' queue (work-horse terminated unexpectedly; waitpid returned 11)
on the line I marked with comment
THIS LINE KILL THE PROGRAM
What am I doing wrong? How I can fix it?
This function I retrieve well from RQ:
def custom_executor(url):
process = CrawlerProcess({
'USER_AGENT': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36",
'DOWNLOAD_TIMEOUT': 20000, # 100
'ROBOTSTXT_OBEY': False,
'HTTPCACHE_ENABLED': False,
'REDIRECT_ENABLED': False,
'SPLASH_URL': 'http://localhost:8050/',
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',
'DOWNLOADER_MIDDLEWARES': {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
},
'SPIDER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': True,
'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': True,
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': True,
'scrapy.extensions.closespider.CloseSpider': True,
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
})
### THIS LINE KILL THE PROGRAM
process.crawl(ExtractorSpider,
start_urls=[url, ], es_client=es_get_connection(),
redis_conn=redis_get_connection())
process.start()
and this is my ExtractorSpider:
class ExtractorSpider(Spider):
name = "Extractor Spider"
handle_httpstatus_list = [301, 302, 303]
def parse(self, response):
yield SplashRequest(url=url, callback=process_screenshot,
endpoint='execute', args=SPLASH_ARGS)
Thank you
回答1:
The process crashed due to heavy calculations while not having enough memory. Increasing the memory fixed that issue.
回答2:
For me the process was timing out, had to change the default timeout
来源:https://stackoverflow.com/questions/47154856/work-horse-process-was-terminated-unexpectedly-rq-and-scrapy