How do I know if PDF pages are color or black-and-white?

后端 未结 7 1942
太阳男子
太阳男子 2021-01-30 02:02

Given a set of PDF files among which some pages are color and the remaining are black & white, is there any program to find out among the given pages which are color and whi

7条回答
  •  时光说笑
    2021-01-30 03:00

    This is one of the most interesting questions I've seen! I agree with some of the other posts that rendering to a bitmap and then analyzing the bitmap will be the most reliable solution. For simple PDFs, here's a faster but less complete approach.

    1. Parse each PDF page
    2. Look for color directives (g, rg, k, sc, scn, etc)
    3. Look for embedded images, analyze for color

    My solution below does #1 and half of #2. The other half of #2 would be to follow up with user-defined color, which involves looking up the /ColorSpace entries in the page and decoding them -- contact me offline if this is interesting to you, as it's very doable but not in 5 minutes.

    First the main program:

    use CAM::PDF;
    
    my $infile = shift;
    my $pdf = CAM::PDF->new($infile);
    PAGE:
    for my $p (1 .. $pdf->numPages) {
       my $tree = $pdf->getPageContentTree($p);
       if (!$tree) {
          print "Failed to parse page $p\n";
          next PAGE;
       }
       my $colors = $tree->traverse('My::Renderer::FindColors')->{colors};
       my $uncertain = 0;
       for my $color (@{$colors}) {
          my ($name, @rest) = @{$color};
          if ($name eq 'g') {
          } elsif ($name eq 'rgb') {
             my ($r, $g, $b) = @rest;
             if ($r != $g || $r != $b) {
                print "Page $p is color\n";
                next PAGE;
             }
          } elsif ($name eq 'cmyk') {
             my ($c, $m, $y, $k) = @rest;
             if ($c != 0 || $m != 0 || $y != 0) {
                print "Page $p is color\n";
                next PAGE;
             }
          } else {
             $uncertain = $name;
          }
       }
       if ($uncertain) {
          print "Page $p has user-defined color ($uncertain), needs more investigation\n";
       } else {
          print "Page $p is grayscale\n";
       }
    }
    

    And then here's the helper renderer that handles color directives on each page:

    package My::Renderer::FindColors;
    
    sub new {
       my $pkg = shift;
       return bless { colors => [] }, $pkg;
    }
    sub clone {
       my $self = shift;
       my $pkg = ref $self;
       return bless { colors => $self->{colors}, cs => $self->{cs}, CS => $self->{CS} }, $pkg;
    }
    sub rg {
       my ($self, $r, $g, $b) = @_;
       push @{$self->{colors}}, ['rgb', $r, $g, $b];
    }
    sub g {
       my ($self, $gray) = @_;
       push @{$self->{colors}}, ['rgb', $gray, $gray, $gray];
    }
    sub k {
       my ($self, $c, $m, $y, $k) = @_;
       push @{$self->{colors}}, ['cmyk', $c, $m, $y, $k];
    }
    sub cs {
       my ($self, $name) = @_;
       $self->{cs} = $name;
    }
    sub cs {
       my ($self, $name) = @_;
       $self->{CS} = $name;
    }
    sub _sc {
       my ($self, $cs, @rest) = @_;
       return if !$cs; # syntax error                                                                                             
       if ($cs eq 'DeviceRGB') { $self->rg(@rest); }
       elsif ($cs eq 'DeviceGray') { $self->g(@rest); }
       elsif ($cs eq 'DeviceCMYK') { $self->k(@rest); }
       else { push @{$self->{colors}}, [$cs, @rest]; }
    }
    sub sc {
       my ($self, @rest) = @_;
       $self->_sc($self->{cs}, @rest);
    }
    sub SC {
       my ($self, @rest) = @_;
       $self->_sc($self->{CS}, @rest);
    }
    sub scn { sc(@_); }
    sub SCN { SC(@_); }
    sub RG { rg(@_); }
    sub G { g(@_); }
    sub K { k(@_); }
    

提交回复
热议问题