Fastest way to create multiple thumbnails from a single large image in Python

前端 未结 1 1501
挽巷
挽巷 2021-01-06 19:07

I have a library of large images (8000x6000px ~13mb) for which I would like to generate multiple thumbnails of smaller sizes with widths of 3000px, 2000px, 1000px, 500px, 25

相关标签:
1条回答
  • 2021-01-06 19:49

    I made some images and did some tests so you can see the effect on the performance of various techniques.

    I made the images to contain random, difficult-to-compress data at dimensions and filesizes to match yours, i.e.

    convert -size 8000x6000 xc:gray +noise random -quality 35 image.jpg
    

    then, ls gives 13MB like this:

    -rw-r--r--  1 mark  staff    13M 23 Aug 17:55 image.jpg
    

    I made 128 such random images because that is nicely divisible by the 8 CPU cores on my machine - see parallel tests later.

    Now for the methods...

    Method 1

    This is the naive method - you just create all the files you asked for, one after the other.

    #!/bin/bash
    for f in image*jpg; do
       for w in 3000 2000 1000 500 250 100; do
          convert $f -resize ${w}x res_${f}_${w}.jpg
       done 
    done
    

    Time: 26 mins 46 secs

    Method 2

    Here we only read each image once, but generate all output sizes from the one input image and it is considerably faster.

    #!/bin/bash
    for f in image*jpg; do
       convert $f -resize 3000x -write res_${f}_3000.jpg \
                  -resize 2000x -write res_${f}_2000.jpg \
                  -resize 1000x -write res_${f}_1000.jpg \
                  -resize 500x  -write res_${f}_500.jpg  \
                  -resize 250x  -write res_${f}_250.jpg  \
                  -resize 100x  res_${f}_100.jpg
    done
    

    Time: 6 min 17 secs

    Method 3

    Here we advise ImageMagick up-front that the largest image we are going to need is only 3000x2250 pixels, so it can use less memory and read fewer DCT levels in and do less I/O. This is called "shrink-on-load".

    #!/bin/bash
    for f in image*jpg; do
       convert -define jpeg:size=3000x2250 $f            \
                  -resize 3000x -write res_${f}_3000.jpg \
                  -resize 2000x -write res_${f}_2000.jpg \
                  -resize 1000x -write res_${f}_1000.jpg \
                  -resize 500x  -write res_${f}_500.jpg  \
                  -resize 250x  -write res_${f}_250.jpg  \
                  -resize 100x  res_${f}_100.jpg
    done
    

    Time: 3 min 37 s

    Just as an aside, to demonstrate the reduced time, I/O and memory needed when you tell ImageMagick up-front how big you are going to need an image up-front, compare these two commands, both reading one of your 8000x6000, 13MB images and both generating the same thumbnail:

    /usr/bin/time -l convert image.jpg -resize 500x result.jpg 2>&1 | egrep "resident|real"        
    1.92 real         1.77 user         0.14 sys
    415727616  maximum resident set size
    

    i.e. 415 MB and 2 seconds

    /usr/bin/time -l convert -define jpeg:size=500x500 image.jpg -resize 500x result.jpg 2>&1 | egrep "resident|real"
    
    0.24 real         0.23 user         0.01 sys
    23592960  maximum resident set size
    

    i.e. 23 MB and 0.2 seconds - and the output image has the same contents and quality.

    Method 4

    Here we go all-out and use GNU Parallel as well as all the foregoing techniques to send your CPU's, fans and power consumption crazy!!!

    #!/bin/bash
    for f in image*jpg; do
       cat<<EOF
    convert -define jpeg:size=3000x2250 $f          \
                  -resize 3000x -write res_${f}_3000.jpg \
                  -resize 2000x -write res_${f}_2000.jpg \
                  -resize 1000x -write res_${f}_1000.jpg \
                  -resize 500x  -write res_${f}_500.jpg  \
                  -resize 250x  -write res_${f}_250.jpg  \
                  -resize 100x  res_${f}_100.jpg
    EOF
    done | parallel
    

    Time: 56 seconds

    In summary, we can reduce the processing time from 27 minutes to 56 seconds by avoiding unnecessarily reading the image and doing as many outputs per input as possible, by telling ImageMagick up front how much of the input image it needs to read and by using GNU Parallel to keep all your lovely CPU cores busy. HTH.

    0 讨论(0)
提交回复
热议问题