Get xpath() to return empty values

后端 未结 1 946
遥遥无期
遥遥无期 2021-02-06 10:01

I have a situation where I have a lot of tags:

12
13
14



        
1条回答
  •  -上瘾入骨i
    2021-02-06 10:46

    This is where it is okay to manually strip the tags and get the text. You can use remove_tags() function provided by w3lib:

    >>> from w3lib.html import remove_tags
    >>> map(remove_tags, sel.xpath('//b').extract())
    [u'12', u'13', u'14', u'', u'121']
    

    Note that w3lib is a Scrapy dependency and is used internally. No need to install it separately.

    Also, it would be better to use Scrapy Input and Output Processors here. Continue using sel.xpath('b') and define an input processor. For example, you can define it for specific Fields for the Item class:

    from scrapy.contrib.loader.processor import MapCompose
    from scrapy.item import Item, Field
    from w3lib.html import remove_tags
    
    class MyItem(Item):
        my_field = Field(input_processor=MapCompose(remove_tags)) 
    

    0 讨论(0)
提交回复
热议问题