How are PDF files able to be partially displayed while downloading?

你离开我真会死。 提交于 2019-11-30 23:28:31

The type of PDF files the OP describes is also known as "web optimized" (marketing term) or "linearized" (technical term in PDF parlance).

It has to be noted that it only works if two extra conditions (on top of the linearization feature of the files) are met:

  1. The PDF viewer needs to be able to handle these types of PDF and take advantage of the linearization feature.
  2. The (remote) host serving the linearized PDFs needs to support "byte streaming".

If byte-streaming is not supported by the server or if the PDF file is not linearized, the entire file still needs to be downloaded completely before it the viewer can display any page.

The description about the PDF file structure quoted by the OP does not apply to linearized PDF files. These are organized in a slightly different way:

  1. There apply special rules for ordering of PDF objects ("standard" PDFs can have objects in any arbitrary order).
  2. The PDF document needs to contain some additional structures called "hint tables" which guarantee efficient navigation within it (even if it is not yet completely downloaded).

Regarding the additional structures, a linearized PDF contains its objects in two groups:

  1. In the first group is the document catalogue, all document-level objects, and all objects belonging to the first-to-be-displayed page (not necessarily "page 0"!). The objects shall be numbered sequentially.

  2. The second group holds all the other objects.

These groups shall be indexed by two xref table sections.

  1. The first group's xref section appears immediately after the first indirect object, very close to the beginning of the file.
  2. The second group's xref section is positioned at the end of the file (just as in standard, non-linearized PDFs).

The first object immediately after the %PDF-1.x header line shall contain a dictionary key indicating the /Linearized property of the file.

This overall structure allows a conforming reader to learn the complete list of object addresses very quickly, without needing to download the complete file from beginning to end:

  • The viewer can display the first page(s) very fast, before the complete file is downloaded.

  • The user can click on a thumbnail page preview (or a link in the ToC of the file) in order to jump to, say, page 445, immediately after the first page(s) have been displayed, and the viewer can then request all the objects required for page 445 by asking the remote server via byte range requests to deliver these "out of order" so the viewer can display this page faster. (While the user reads pages out of order, the downloading of the complete document will still go on in the background...)

The technical details of PDF "linearization" can be found in the 'normative' Appendix F of Adobe's original PDF 1.7 Specification (ca. 11 MByte -- which in itself is an example of such a linearized PDF file!)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!