scrapy: Populate nested items with itemLoader

前端 未结 1 1239

I have this object I\'m trying to populate with an itemLoader:

{
  \"domains\": \"string\",
  \"date_insert\": \"2016-12-23T11:25:00.213Z\",
  \"title\": \"         


        
相关标签:
1条回答
  • 2021-01-15 17:45

    Thanks to @eLRuLL I manage to find a decent solution :

    items.py :

    class StatsItem(scrapy.Item):
        views_count=scrapy.Field()
        comments_count=scrapy.Field()
    
    class ArticleItem(scrapy.Item):
        [...]
        stats=scrapy.Field(
            input_processor=Identity())
    
    
    class StatsItemLoader(ItemLoader):
        default_input_processor=MapCompose(remove_tags)
        default_output_processor=TakeFirst()
        default_item_class=StatsItem
    

    spider.py:

    def parse(self, response):
        [...]
        loader.add_value('stats', self.getStats(response))
        [...]
    
    def getStats(self, response):
        statsLoader = StatsItemLoader(response=response)
        statsLoader.add_xpath('comments_count', '//div[@class=\'btn-count\']//a/text()')
        statsLoader.add_value('views_count', '42')
        return dict(statsLoader.load_item())
    

    Originally it was not working because my input_processor was MapCompose(remove_tags) for the stats field. In order to serialize the object you have to return dict(loader.load_item()) and not just return loader.load_item()

    Thanks !

    0 讨论(0)
提交回复
热议问题