suppress Scrapy Item printed in logs after pipeline

后端 未结 8 567
无人及你
无人及你 2020-12-25 12:43

I have a scrapy project where the item that ultimately enters my pipeline is relatively large and stores lots of metadata and content. Everything is working properly in my s

8条回答
  •  醉梦人生
    2020-12-25 13:07

    I think the cleanest way to do this is to add a filter to the scrapy.core.scraper logger that changes the message in question. This allows you to keep your Item's __repr__ intact and to not have to change scrapy's logging level:

    import re
    
    class ItemMessageFilter(logging.Filter):
        def filter(self, record):
            # The message that logs the item actually has raw % operators in it,
            # which Scrapy presumably formats later on
            match = re.search(r'(Scraped from %\(src\)s)\n%\(item\)s', record.msg)
            if match:
                # Make the message everything but the item itself
                record.msg = match.group(1)
            # Don't actually want to filter out this record, so always return 1
            return 1
    
    logging.getLogger('scrapy.core.scraper').addFilter(ItemMessageFilter())
    

提交回复
热议问题