Using GhostScript to get page size

前端 未结 3 1184
不知归路
不知归路 2020-12-01 12:48

Is it possible to get the page size (from e.g. a PDF document page) using GhostScript? I have seen the \"bbox\" device, but it returns the bounding box (it differs per page)

相关标签:
3条回答
  • 2020-12-01 13:23

    Unfortunately it doesn't seem quite easy to get the (possibly different) page sizes (or *Boxes for that matter) inside a PDF with the help of Ghostscript.

    But since you asked for other possibilities as well: a rather reliable way to determine the media sizes for each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool pdfinfo.exe. This utility is part of the XPDF tools from http://www.foolabs.com/xpdf/download.html . You can run the tool with the "-box" parameter and tell it with "-f 3" to start at page 3 and with "-l 8" to stop processing at page 8.

    Example output:

    C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
    Creator:        FrameMaker 6.0
    Producer:       Acrobat Distiller 5.0.5 (Windows)
    CreationDate:   08/17/06 16:43:06
    ModDate:        08/22/06 12:20:24
    Tagged:         no
    Pages:          146
    Encrypted:      no
    Page    1 size: 419.535 x 297.644 pts
    Page    2 size: 297.646 x 419.524 pts
    Page    3 size: 297.646 x 419.524 pts
    Page    1 MediaBox:     0.00     0.00   595.00   842.00
    Page    1 CropBox:     87.25   430.36   506.79   728.00
    Page    1 BleedBox:    87.25   430.36   506.79   728.00
    Page    1 TrimBox:     87.25   430.36   506.79   728.00
    Page    1 ArtBox:      87.25   430.36   506.79   728.00
    Page    2 MediaBox:     0.00     0.00   595.00   842.00
    Page    2 CropBox:    148.17   210.76   445.81   630.28
    Page    2 BleedBox:   148.17   210.76   445.81   630.28
    Page    2 TrimBox:    148.17   210.76   445.81   630.28
    Page    2 ArtBox:     148.17   210.76   445.81   630.28
    Page    3 MediaBox:     0.00     0.00   595.00   842.00
    Page    3 CropBox:    148.17   210.76   445.81   630.28
    Page    3 BleedBox:   148.17   210.76   445.81   630.28
    Page    3 TrimBox:    148.17   210.76   445.81   630.28
    Page    3 ArtBox:     148.17   210.76   445.81   630.28
    File size:      6888764 bytes
    Optimized:      yes
    PDF version:    1.4
    
    0 讨论(0)
  • 2020-12-01 13:26

    Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.

    This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps file.

    The included comments say you should run it like this in order to list fonts used, media sizes used

    gswin32c -dNODISPLAY ^
       -q ^
       -sFile=____.pdf ^
       [-dDumpMediaSizes] ^
       [-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
       toolbin/pdf_info.ps
    

    I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:

    C:\> gswin32c ^
          -dNODISPLAY ^
          -q ^
          -sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
          -dDumpMediaSizes ^
          C:/gs8.71/lib/pdf_info.ps
    
    
      c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
      Creator: FrameMaker 6.0
      Producer: Acrobat Distiller 5.0.5 (Windows)
      CreationDate: D:20060817164306Z
      ModDate: D:20060822122024+02'00'
    
      Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
      Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
      Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
      Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
      [....]
    
    0 讨论(0)
  • 2020-12-01 13:34

    A solution in pure GhostScript PostScript, no additional scripts necessary:

    gs -dQUIET -sFileName=path/to/file.pdf -c "FileName (r) file runpdfbegin 1 1 pdfpagecount {pdfgetpage /MediaBox get {=print ( ) print} forall (\n) print} for quit"

    The command prints the MediaBox of each page in the PDF as four numbers per line. An example from a 3-page PDF:

    0 0 595 841
    0 0 595 841
    0 0 595 841
    

    Here's a breakdown of the command:

    FileName (r) file  % open file given by -sFileName
    runpdfbegin        % open file as pdf
    1 1 pdfpagecount { % for each page index
      pdfgetpage       % get pdf page properties (pushes a dict)
      /MediaBox get    % get MediaBox value from dict (pushes an array of numbers)
      {                % for every array element
        =print         % print element value
        ( ) print      % print single space
      } forall
      (\n) print       % print new line
    } for
    quit               % quit interpreter. Not necessary if you pass -dBATCH to gs
    

    Replace /MediaBox with /CropBox to get the crop box.

    0 讨论(0)
提交回复
热议问题