How can I tell the resolution of scanned PDF from within a shell script?

后端 未结 7 1890
猫巷女王i
猫巷女王i 2021-02-03 11:25

I have a large collection of documents scanned into PDF format, and I wish to write a shell script that will convert each document to DjVu format. Some documents were scanned a

7条回答
  •  一生所求
    2021-02-03 11:51

    PDF is a resolution independent format, it's a nonsensical question. You may have scanned some bitmaps at a particular resolution, and those bitmaps are individually embedded inside the pdf, but the PDF itself may contain images at multiple resolutions, as well as resolution independent vector graphics. There's no way to know without cracking open the pdf and examining every object inside it.

    Editing to continue expounding on the problem:

    You may have gotten lucky, and the software you used to scan the documents embedded some metadata about this, but don't bet on it. Such metadata is unlikely to be standard. As far as parsing the pdf, you'd want a prewritten library - such as ghostscript. The problem is that PDF isn't really a format so much as it is a specified subset of the PostScript programming language, and an agreed upon way of compressing/compiling this subset along with some binaries. Thus reading a PDF is more complicated than other types of image formats, as it involves writing a language interpreter - not so straightforward.

    The best approach is to either throw up your hands and give up, or really look hard at ghostscript and see if you can get that to tell you the answer.

提交回复
热议问题