Scrapy - Activating an Item Pipeline component - ITEM_PIPELINES setting

こ雲淡風輕ζ 提交于 2019-12-11 03:25:36

问题


In scrapy documentation there is this information:

Activating an Item Pipeline component

To activate an Item Pipeline component you must add its class to the ITEM_PIPELINES setting, like in the following example:

ITEM_PIPELINES = { 'myproject.pipelines.PricePipeline': 300, 'myproject.pipelines.JsonWriterPipeline': 800, }

The integer values you assign to classes in this setting determine the order they run in- items go through pipelines from order number low to high. It’s customary to define these numbers in the 0-1000 range.

I do not understand the last paragraph, mainly "determine the order they run in- items go through pipelines from order number low to high", can you explain in other words? that numbers are chosen because of what? in the range is 0-1000 how to choose the values?


回答1:


Since a dictionary in Python is an unordered collection and ITEM_PIPELINES has to be a dictionary (as a lot of other settings, like, for example, SPIDER_MIDDLEWARES), you need to, somehow, define an order in which pipelines are applied. This is why you need to assign a number from 0 to 1000 to each pipeline you define.

FYI, if you look into Scrapy source, you'll find build_component_list() function which is called for each setting like ITEM_PIPELINES - it makes a list (ordered collection) out of the dictionary you define in ITEM_PIPELINES using dictionary values for sorting:

def build_component_list(base, custom):
    """Compose a component list based on a custom and base dict of components
    (typically middlewares or extensions), unless custom is already a list, in
    which case it's returned.
    """
    if isinstance(custom, (list, tuple)):
        return custom
    compdict = base.copy()
    compdict.update(custom)
    items = (x for x in six.iteritems(compdict) if x[1] is not None)
    return [x[0] for x in sorted(items, key=itemgetter(1))]



回答2:


From docs

ITEM_PIPELINES

Default: {}

A dict containing the item pipelines to use, and their orders. The dict is empty by default order values are arbitrary but it’s customary to define them in the 0-1000 range.



来源:https://stackoverflow.com/questions/29892547/scrapy-activating-an-item-pipeline-component-item-pipelines-setting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!