Work-horse process was terminated unexpectedly RQ and Scrapy

对着背影说爱祢 提交于 2020-12-30 03:14:43

问题


I am trying to retrieve a function from redis (rq), which generate a CrawlerProcess but i'm getting

Work-horse process was terminated unexpectedly (waitpid returned 11)

console log:

Moving job to 'failed' queue (work-horse terminated unexpectedly; waitpid returned 11)

on the line I marked with comment

THIS LINE KILL THE PROGRAM

What am I doing wrong? How I can fix it?

This function I retrieve well from RQ:

def custom_executor(url):
    process = CrawlerProcess({
        'USER_AGENT': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36",
        'DOWNLOAD_TIMEOUT': 20000,  # 100
        'ROBOTSTXT_OBEY': False,
        'HTTPCACHE_ENABLED': False,
        'REDIRECT_ENABLED': False,

        'SPLASH_URL': 'http://localhost:8050/',
        'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
        'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',

        'DOWNLOADER_MIDDLEWARES': {
            'scrapy_splash.SplashCookiesMiddleware': 723,
            'scrapy_splash.SplashMiddleware': 725,
            'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
        },

        'SPIDER_MIDDLEWARES': {
            'scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware': True,
            'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware': True,
            'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': True,
            'scrapy.extensions.closespider.CloseSpider': True,

            'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
        }
    })

    ### THIS LINE KILL THE PROGRAM
    process.crawl(ExtractorSpider, 
                   start_urls=[url, ], es_client=es_get_connection(),
                   redis_conn=redis_get_connection()) 

    process.start()

and this is my ExtractorSpider:

class ExtractorSpider(Spider):
    name = "Extractor Spider"
    handle_httpstatus_list = [301, 302, 303]

    def parse(self, response):
        yield SplashRequest(url=url, callback=process_screenshot,
                            endpoint='execute', args=SPLASH_ARGS)

Thank you


回答1:


The process crashed due to heavy calculations while not having enough memory. Increasing the memory fixed that issue.




回答2:


For me the process was timing out, had to change the default timeout



来源:https://stackoverflow.com/questions/47154856/work-horse-process-was-terminated-unexpectedly-rq-and-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!