suppress Scrapy Item printed in logs after pipeline

后端 未结 8 569
无人及你
无人及你 2020-12-25 12:43

I have a scrapy project where the item that ultimately enters my pipeline is relatively large and stores lots of metadata and content. Everything is working properly in my s

相关标签:
8条回答
  • 2020-12-25 13:09

    If you want to exclude only some attributes of the output, you can extend the answer given by @dino

    from scrapy.item import Item, Field
    import json
    
    class MyItem(Item):
        attr1 = Field()
        attr2 = Field()
        attr1ToExclude = Field()
        attr2ToExclude = Field()
        # ...
        attrN = Field()
    
        def __repr__(self):
            r = {}
            for attr, value in self.__dict__['_values'].iteritems():
                if attr not in ['attr1ToExclude', 'attr2ToExclude']:
                    r[attr] = value
            return json.dumps(r, sort_keys=True, indent=4, separators=(',', ': '))
    
    0 讨论(0)
  • 2020-12-25 13:09

    We use the following sample in production:

    import logging
    
    logging.getLogger('scrapy.core.scraper').addFilter(
        lambda x: not x.getMessage().startswith('Scraped from'))
    

    This is a very simple and working code. We add this code in __init__.py in module with spiders. In this case this code automatically run with command like scrapy crawl <spider_name> for all spiders.

    0 讨论(0)
提交回复
热议问题