Replacing vector images in a PDF with raster images

后端 未结 8 1118
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-30 22:46

Is there any easy (scriptable) way to convert a PDF with vector images into a PDF with raster images? In other words, I want to generate a PDF with the exact same (un-rasterized

8条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-30 23:18

    inkscape is the best solution, I quickly made this rather unoptimized batch file that does exactly that and you can play with it and change options. ImageMacick convert, gs, or pdftoimages don't work as good as inkscape they either don't export the layers or export but with bad quality :

    #!/bin/bash
    #set -xev
    ORIGINAL_FOLDER=`pwd` 
    JPEGS=`mktemp -d`
    unzip "$1" -d "$JPEGS"
    cd "$JPEGS"
    # expang the pdf in pdf pages
    pdftk combined_to_do.pdf burst output pg_%04d.pdf
    #1) print the pdf's to pngs as they are seen with alpha, layers, transparency etc, this cannot be done by ImageMacick convert or pdftoimages
    ls ./pg*.pdf | xargs -L1 -I {}  inkscape {} -z --export-dpi=300 --export-area-drawing --export-png={}.png
    #2) Second change to jpgs
    rm *.pdf
    ls ./p*.png | xargs -L1 -I {} convert {}  -quality 100 -density 300  {}.jpg
    #3) This to make a pdf file out of every jpg image without loss of either resolution or quality:
    ls -1 ./*jpg | xargs -L1 -I {} img2pdf {} -o {}.pdf
    #4) This to concatenate the pdfpages into one:
    pdftk *.jpg.pdf cat output combined.pdf
    #5) And last I add an OCRed text layer that doesn't change the quality of the scan in the pdfs so they can be searchable:
    pypdfocr combined.pdf
    cp "$JPEGS/combined_ocr.pdf" "$ORIGINAL_FOLDER/$1_ocr.pdf"
    cp "$JPEGS/combined.pdf" "$ORIGINAL_FOLDER/$1.pdf"
    

提交回复
热议问题