Given a set of PDF files among which some pages are color and the remaining are black & white, is there any program to find out among the given pages which are color and whi
It is possible to use the Image Magick tool identify
. If used on PDF pages it converts the page first to a raster image. If the page contained color can be tested using the -format "%[colorspace]"
option, which for my PDF printed either Gray
or RGB
. IMHO identify
(or what ever tool it uses in the background; Ghostscript?) does choose the colorspace depending on the presents of color.
An example is:
identify -format "%[colorspace]" $FILE.pdf[$PAGE]
where PAGE is the page starting from 0, not 1. If the page selection is not used all pages will be collapsed to one, which is not what you want.
I wrote the following BASH script which uses pdfinfo
to get the number of pages and then loops over them. Outputting the pages which are in color. I also added a feature for double sided document where you might need a non-colored backside page as well.
Using the outputted space separated list the colored PDF pages can be extracted using pdftk
:
pdftk $FILE cat $PAGELIST output color_${FILE}.pdf
#!/bin/bash
FILE=$1
PAGES=$(pdfinfo ${FILE} | grep 'Pages:' | sed 's/Pages:\s*//')
GRAYPAGES=""
COLORPAGES=""
DOUBLECOLORPAGES=""
echo "Pages: $PAGES"
N=1
while (test "$N" -le "$PAGES")
do
COLORSPACE=$( identify -format "%[colorspace]" "$FILE[$((N-1))]" )
echo "$N: $COLORSPACE"
if [[ $COLORSPACE == "Gray" ]]
then
GRAYPAGES="$GRAYPAGES $N"
else
COLORPAGES="$COLORPAGES $N"
# For double sided documents also list the page on the other side of the sheet:
if [[ $((N%2)) -eq 1 ]]
then
DOUBLECOLORPAGES="$DOUBLECOLORPAGES $N $((N+1))"
#N=$((N+1))
else
DOUBLECOLORPAGES="$DOUBLECOLORPAGES $((N-1)) $N"
fi
fi
N=$((N+1))
done
echo $DOUBLECOLORPAGES
echo $COLORPAGES
echo $GRAYPAGES
#pdftk $FILE cat $COLORPAGES output color_${FILE}.pdf