How to open PDF raw?

前端未结

关注

 4  622

I\'ve been wanting to see the insides of a PDF for a while, like, the raw source code of it so I can look at it. Any way of doing that?

相关标签:

4条回答

感情败类

2020-12-23 17:27

If the purpose is just to look into the file, then any simple text editor will do, ex, Notepad. PDF is just a text based format, including embedded content byte streams. Raw PDF looks like this:

>>
/Border [0 0 0]
/Rect [121.02 332.48 363.24 343.64]
/StructParent 1321
/Subtype /Link
/Type /Annot
>>
endobj
64579 0 obj
<<
/Filter /FlateDecode
/Length 5771
>>
stream
Ũn0x/�+�}�ǹ����\֛ bYO�5[��X��W��L��(�������V�A3�C���������u큋_�a��ךm2N�6�    ��A��8
�d���NQ⺢GI��G�[��)�̉Y��R�y{R����&�&�;��g�k1���ҋeTC�(W��`���*��(;�AEc<=  mnZ+��|T��v
�.��зe�aޞ��V4�b���L����k�Oj.ֿ�y�����kc|I��  ��C�0��Hf�7d�/�z���m��o��A��B��IJ�%�. 
!�%f�б���&�ޒ�4Ύ7�l�3���3`�
endstream
endobj
64580 0 obj
<<
/Border [0 0 0]
/Dest <E4AE7DD2769553EF1668>
/Rect [219 648.5 256.8 659.66]
/StructParent 1323
/Subtype /Link
/Type /Annot
>>

What you see are basic COS objects like name, dictionary, stream and so on. All objects are described in PDF 32000 standard, see section 7.3 Objects.

0 讨论(0)

遥遥无期

2020-12-23 17:34
In addition to the qpdf tool conversion into postscript might be helpful. PDF is a subset of PS. Usually its quite easy to figure out, e.g. where the labels of a graph are. You can either use pdf2ps or invoke ghostscript
```
gs -sDEVICE=pswrite some.pdf -sOutputFile=some.ps -dNOPAUSE -c quit
```
When you generate your PDFs using pdflatex you can disable compression with an option. This makes the PDF more readable.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-23 17:39

Use a Hex editor. Of course, unless you know the PDF specification (PDF, 8.6 MB), you won't recognize much.

0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-23 17:45
Looking at the raw code of PDFs will not serve you much unless you also have an idea about its internal structure. You should get yourself a copy of the official PDF reference (download PDF), and you should have read some introductionary article such as ~~this [gone] or~~ this to begin with.

Even after such a preparation, you'll not discover much useful when staring at the raw code. Because PDFs usually will contain parts which are "filtered" (that means: compressed).

How to look at the real PDF source behind the 'raw' binary parts

Jay Birkenbilt's qpdf is a very useful commandline tool (available for Linux, Mac OSX and as source code, under the open source Artistic License), which can unpack most filtered content and re-organize the internal structure in a way that gives you much more insight into it (all objects are numerically ordered, etc.). The commandline to achieve this is:
```
 qpdf  --qdf  original.pdf  unpacked.pdf
```
Another useful and free tool (GPL licensed, but Linux-only AFAIK) to look into PDFs is of course PDFEdit. This one even comes with a GUI (if you prefer that), while still allowing you access to the internal structure and "raw" PDF code.
0 讨论(0)
发布评论:

提交评论
- 加载中...

How to open PDF raw?

How to look at the real PDF source behind the 'raw' binary parts