I have a scrapy project where the item that ultimately enters my pipeline is relatively large and stores lots of metadata and content. Everything is working properly in my s
I think the cleanest way to do this is to add a filter to the scrapy.core.scraper
logger that changes the message in question. This allows you to keep your Item's __repr__
intact and to not have to change scrapy's logging level:
import re
class ItemMessageFilter(logging.Filter):
def filter(self, record):
# The message that logs the item actually has raw % operators in it,
# which Scrapy presumably formats later on
match = re.search(r'(Scraped from %\(src\)s)\n%\(item\)s', record.msg)
if match:
# Make the message everything but the item itself
record.msg = match.group(1)
# Don't actually want to filter out this record, so always return 1
return 1
logging.getLogger('scrapy.core.scraper').addFilter(ItemMessageFilter())