Export PDF page labels on command line

社会主义新天地 提交于 2019-12-03 06:13:52
Kurt Pfeifle

Short answer:
I am not aware of any (free) tool that can 'simply print' the page label for each page.

Also, you'll not be able to evade the expansion compressed objects and object streams, using a tool like qpdf or one with equivalent capabilities.

Long answer:
There's no such tool because these are the only a few things you can safely rely on when it comes to page labels. These are the following:

  1. Each PDF document must contain a root object.
  2. That root object must be of /Type /Catalog.
  3. The document's trailer will show where to find the object using the key /Root followed by the indirect object number reference.
  4. IF a PDF document uses non-standard page labels, then the document root object must have an entry named /PageLabels.

Here is where it stops to be relatively easy. Because the object the /PageLabels key refers to may be contained in a compressed object stream. This means that you'd have to expand that object stream.

If you really succeeded to get the description of the page labels as ASCII, you'll discover that it's not an easily parseable flat list (like a dictionary is): it is a number tree.

I'll not go into the details of these complexities, because it would take a very long article to describe all possible variations. You better read it up directly in the official ISO PDF-1.7 specification.

But instead I'll give you an example in ASCII PDF code:

213 0 obj
  << /Type /Catalog
     /PageLabels 
        << 
           /Nums 
                 [ 
                   0 <<           % start labeling from page no. 1
                       /S /r      % label with lowercase roman numbers
                     >> 
                   7 <<           % start new labeling from page no. 8
                       /S /D      % label with standard decimal numbers
                     >> 
                   11 <<          % start labeling page no. 12
                       /S /D      % label with decimal numbers...
                       /P (ABCD-) %   ...but using label prefix 'ABCD-'...
                       /St 3      %   ...followed by '3' as the start decimal.
                     >>
                  ]
        >>
     %%...........................
     %%...more root object keys...
     %%........................... 
  >>
endobj

The above example will label the pages number 1, 2, 3, ... (last) like this:

i
ii
iii
iv
v
vi
1
2
3
4
ABCD-3
ABCD-4
ABCD-5
ABCD-6
...and so on until last page...

As you can see, the PDF method of labeling pages (mapping page numbers to page names) is completely non-intuitive. You can only understand it by studying the PDF specification.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!