Understandin how rename images scrapy works

前端 未结 1 885
借酒劲吻你
借酒劲吻你 2021-01-07 12:37

I see all questions here, but i dont understand yet.

Actualy with de code bellow i do what i need, except rename de image, so i try change name in the items.py

1条回答
  •  -上瘾入骨i
    2021-01-07 12:56

    My code is based on Scrapy Image Pipeline: How to rename images? I tested it a week ago and it works on my own spiders.

    # This pipeline is designed for an item with multiple images
    class ImagesWithNamesPipeline(ImagesPipeline):
        def get_media_requests(self, item, info):
            # values in field "image_name" must have suffix ".jpg"
            # you can only change "image_name" to your own image name filed "images"
            # however it should be a list
            for (image_url, image_name) in zip(item[self.IMAGES_URLS_FIELD], item["image_names"]):
                yield scrapy.Request(url=image_url, meta={"image_name": image_name})
    
        def file_path(self, request, response=None, info=None):
            image_name = request.meta["image_name"]
            return image_name
    

    Here is how the ImagePipeline works:

    The pipeline will execute image_downloaded -> get_images -> file_path in order. ("->" means invokes)

    • image_downloaded: save images that get_images return by invoking persist_file
    • get_images: convert images to JPEG
    • file_path: return the relative path of image

    I scaned through the source code of ImagePipeline and found no special field for rename an image. Scrapy will rename it in this way:

    def file_path(self, request, response=None, info=None):
        image_guid = hashlib.sha1(to_bytes(url)).hexdigest()  # change to request.url after deprecation
        return 'full/%s.jpg' % (image_guid)
    

    Therefore we should override method file_path. According to the source code of FilePipeline which ImagePipeline inherits, we only need to return relative paths and persist_file will get things done.

    0 讨论(0)
提交回复
热议问题