How do I know if PDF pages are color or black-and-white?

后端 未结 7 1940
太阳男子
太阳男子 2021-01-30 02:02

Given a set of PDF files among which some pages are color and the remaining are black & white, is there any program to find out among the given pages which are color and whi

相关标签:
7条回答
  • 2021-01-30 02:40

    Newer versions of Ghostscript (version 9.05 and later) include a "device" called inkcov. It calculates the ink coverage of each page (not for each image) in Cyan (C), Magenta (M), Yellow (Y) and Black (K) values, where 0.00000 means 0%, and 1.00000 means 100% (see Detecting all pages which contain color).

    For example:

    $ gs -q -o - -sDEVICE=inkcov file.pdf 
    0.11264  0.11605  0.11605  0.09364 CMYK OK
    0.11260  0.11601  0.11601  0.09360 CMYK OK
    

    If the CMY values are not 0 then the page is color.

    To just output the pages that contain colors use this handy oneliner:

    $ gs -o - -sDEVICE=inkcov file.pdf |tail -n +4 |sed '/^Page*/N;s/\n//'|sed -E '/Page [0-9]+ 0.00000  0.00000  0.00000  / d'
    
    0 讨论(0)
  • 2021-01-30 02:42

    The script from Martin Scharrer is great. It contains a minor bug: It counts two pages which contain color and are directly consecutive twice. I fixed that. In addition the script now counts the pages and lists the grayscale pages for double-paged printing. Also it prints the pages comma separated, so the output can directly be used for printing from a PDF viewer. I've added the code, but you can download it here, too.

    Cheers, timeshift

    #!/bin/bash
    
    if [ $# -ne 1 ] 
    then
        echo "USAGE: This script needs exactly one paramter: the path to the PDF"
        kill -SIGINT $$
    fi
    
    FILE=$1
    PAGES=$(pdfinfo ${FILE} | grep 'Pages:' | sed 's/Pages:\s*//')
    
    GRAYPAGES=""
    COLORPAGES=""
    DOUBLECOLORPAGES=""
    DOUBLEGRAYPAGES=""
    OLDGP=""
    DOUBLEPAGE=0
    DPGC=0
    DPCC=0
    SPGC=0
    SPCC=0
    
    echo "Pages: $PAGES"
    N=1
    while (test "$N" -le "$PAGES")
    do
        COLORSPACE=$( identify -format "%[colorspace]" "$FILE[$((N-1))]" )
        echo "$N: $COLORSPACE"
        if [[ $DOUBLEPAGE -eq -1 ]]
        then
        DOUBLEGRAYPAGES="$OLDGP"
        DPGC=$((DPGC-1))
        DOUBLEPAGE=0
        fi
        if [[ $COLORSPACE == "Gray" ]]
        then
            GRAYPAGES="$GRAYPAGES,$N"
        SPGC=$((SPGC+1))
        if [[ $DOUBLEPAGE -eq 0 ]]
        then
            OLDGP="$DOUBLEGRAYPAGES"
            DOUBLEGRAYPAGES="$DOUBLEGRAYPAGES,$N"
            DPGC=$((DPGC+1))
        else 
            DOUBLEPAGE=0
        fi
        else
            COLORPAGES="$COLORPAGES,$N"
        SPCC=$((SPCC+1))
            # For double sided documents also list the page on the other side of the sheet:
            if [[ $((N%2)) -eq 1 ]]
            then
                DOUBLECOLORPAGES="$DOUBLECOLORPAGES,$N,$((N+1))"
            DOUBLEPAGE=$((N+1))
            DPCC=$((DPCC+2))
                #N=$((N+1))
            else
            if [[ $DOUBLEPAGE -eq 0 ]]
            then 
                    DOUBLECOLORPAGES="$DOUBLECOLORPAGES,$((N-1)),$N"
            DPCC=$((DPCC+2))
            DOUBLEPAGE=-1
            elif [[ $DOUBLEPAGE -gt 0 ]]
            then
            DOUBLEPAGE=0            
            fi                      
            fi
        fi
        N=$((N+1))
    done
    
    echo " "
    echo "Double-paged printing:"
    echo "  Color($DPCC): ${DOUBLECOLORPAGES:1:${#DOUBLECOLORPAGES}-1}"
    echo "  Gray($DPGC): ${DOUBLEGRAYPAGES:1:${#DOUBLEGRAYPAGES}-1}"
    echo " "
    echo "Single-paged printing:"
    echo "  Color($SPCC): ${COLORPAGES:1:${#COLORPAGES}-1}"
    echo "  Gray($SPGC): ${GRAYPAGES:1:${#GRAYPAGES}-1}"
    #pdftk $FILE cat $COLORPAGES output color_${FILE}.pdf
    
    0 讨论(0)
  • 2021-01-30 02:45

    It is possible to use the Image Magick tool identify. If used on PDF pages it converts the page first to a raster image. If the page contained color can be tested using the -format "%[colorspace]" option, which for my PDF printed either Gray or RGB. IMHO identify (or what ever tool it uses in the background; Ghostscript?) does choose the colorspace depending on the presents of color.

    An example is:

    identify -format "%[colorspace]" $FILE.pdf[$PAGE]
    

    where PAGE is the page starting from 0, not 1. If the page selection is not used all pages will be collapsed to one, which is not what you want.

    I wrote the following BASH script which uses pdfinfo to get the number of pages and then loops over them. Outputting the pages which are in color. I also added a feature for double sided document where you might need a non-colored backside page as well.

    Using the outputted space separated list the colored PDF pages can be extracted using pdftk:

    pdftk $FILE cat $PAGELIST output color_${FILE}.pdf
    

    #!/bin/bash
    
    FILE=$1
    PAGES=$(pdfinfo ${FILE} | grep 'Pages:' | sed 's/Pages:\s*//')
    
    GRAYPAGES=""
    COLORPAGES=""
    DOUBLECOLORPAGES=""
    
    echo "Pages: $PAGES"
    N=1
    while (test "$N" -le "$PAGES")
    do
        COLORSPACE=$( identify -format "%[colorspace]" "$FILE[$((N-1))]" )
        echo "$N: $COLORSPACE"
        if [[ $COLORSPACE == "Gray" ]]
        then
            GRAYPAGES="$GRAYPAGES $N"
        else
            COLORPAGES="$COLORPAGES $N"
            # For double sided documents also list the page on the other side of the sheet:
            if [[ $((N%2)) -eq 1 ]]
            then
                DOUBLECOLORPAGES="$DOUBLECOLORPAGES $N $((N+1))"
                #N=$((N+1))
            else
                DOUBLECOLORPAGES="$DOUBLECOLORPAGES $((N-1)) $N"
            fi
        fi
        N=$((N+1))
    done
    
    echo $DOUBLECOLORPAGES
    echo $COLORPAGES
    echo $GRAYPAGES
    #pdftk $FILE cat $COLORPAGES output color_${FILE}.pdf
    
    0 讨论(0)
  • 2021-01-30 02:46

    Here is the ghostscript solution for Windows, which requires grep from GnuWin (http://gnuwin32.sourceforge.net/packages/grep.htm):

    Monochrome (Black and White) pages:

    gswin64c -q -o - -sDEVICE=inkcov DOCUMENT.pdf | grep "^ 0.00000 0.00000 0.00000" | find /c /v ""

    Color pages:

    gswin64c -q -o - -sDEVICE=inkcov DOCUMENT.pdf | grep -v "^ 0.00000 0.00000 0.00000" | find /c /v ""

    Total pages (you get this one easier from any pdf reader):

    gswin64c -q -o - -sDEVICE=inkcov DOCUMENT.pdf | find /c /v ""

    0 讨论(0)
  • 2021-01-30 02:59

    ImageMagick has some built-in methods for image comparison.

    http://www.imagemagick.org/Usage/compare/#type_general

    There are some Perl APIs for ImageMagick, so maybe if you cleverly combine these with a PDF to Image converter you can find a way to do your black & white test.

    0 讨论(0)
  • 2021-01-30 03:00

    This is one of the most interesting questions I've seen! I agree with some of the other posts that rendering to a bitmap and then analyzing the bitmap will be the most reliable solution. For simple PDFs, here's a faster but less complete approach.

    1. Parse each PDF page
    2. Look for color directives (g, rg, k, sc, scn, etc)
    3. Look for embedded images, analyze for color

    My solution below does #1 and half of #2. The other half of #2 would be to follow up with user-defined color, which involves looking up the /ColorSpace entries in the page and decoding them -- contact me offline if this is interesting to you, as it's very doable but not in 5 minutes.

    First the main program:

    use CAM::PDF;
    
    my $infile = shift;
    my $pdf = CAM::PDF->new($infile);
    PAGE:
    for my $p (1 .. $pdf->numPages) {
       my $tree = $pdf->getPageContentTree($p);
       if (!$tree) {
          print "Failed to parse page $p\n";
          next PAGE;
       }
       my $colors = $tree->traverse('My::Renderer::FindColors')->{colors};
       my $uncertain = 0;
       for my $color (@{$colors}) {
          my ($name, @rest) = @{$color};
          if ($name eq 'g') {
          } elsif ($name eq 'rgb') {
             my ($r, $g, $b) = @rest;
             if ($r != $g || $r != $b) {
                print "Page $p is color\n";
                next PAGE;
             }
          } elsif ($name eq 'cmyk') {
             my ($c, $m, $y, $k) = @rest;
             if ($c != 0 || $m != 0 || $y != 0) {
                print "Page $p is color\n";
                next PAGE;
             }
          } else {
             $uncertain = $name;
          }
       }
       if ($uncertain) {
          print "Page $p has user-defined color ($uncertain), needs more investigation\n";
       } else {
          print "Page $p is grayscale\n";
       }
    }
    

    And then here's the helper renderer that handles color directives on each page:

    package My::Renderer::FindColors;
    
    sub new {
       my $pkg = shift;
       return bless { colors => [] }, $pkg;
    }
    sub clone {
       my $self = shift;
       my $pkg = ref $self;
       return bless { colors => $self->{colors}, cs => $self->{cs}, CS => $self->{CS} }, $pkg;
    }
    sub rg {
       my ($self, $r, $g, $b) = @_;
       push @{$self->{colors}}, ['rgb', $r, $g, $b];
    }
    sub g {
       my ($self, $gray) = @_;
       push @{$self->{colors}}, ['rgb', $gray, $gray, $gray];
    }
    sub k {
       my ($self, $c, $m, $y, $k) = @_;
       push @{$self->{colors}}, ['cmyk', $c, $m, $y, $k];
    }
    sub cs {
       my ($self, $name) = @_;
       $self->{cs} = $name;
    }
    sub cs {
       my ($self, $name) = @_;
       $self->{CS} = $name;
    }
    sub _sc {
       my ($self, $cs, @rest) = @_;
       return if !$cs; # syntax error                                                                                             
       if ($cs eq 'DeviceRGB') { $self->rg(@rest); }
       elsif ($cs eq 'DeviceGray') { $self->g(@rest); }
       elsif ($cs eq 'DeviceCMYK') { $self->k(@rest); }
       else { push @{$self->{colors}}, [$cs, @rest]; }
    }
    sub sc {
       my ($self, @rest) = @_;
       $self->_sc($self->{cs}, @rest);
    }
    sub SC {
       my ($self, @rest) = @_;
       $self->_sc($self->{CS}, @rest);
    }
    sub scn { sc(@_); }
    sub SCN { SC(@_); }
    sub RG { rg(@_); }
    sub G { g(@_); }
    sub K { k(@_); }
    
    0 讨论(0)
提交回复
热议问题