ReportLab and pdfrw: Importing Scanned PDF

问题

Using the code below, I am trying to import a pdf page into an existing canvas object and save to PDF. This usually works just fine, but I noticed that when I try it with a PDF generated from a scanned document, it results in a blank page. Any takers?

from reportlab.pdfgen import canvas
from pdfrw import PdfReader
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl

c = canvas.Canvas(Out_Folder+pdf_file_name)
c.setPageSize([11*inch, 8.5*inch])

page = PdfReader(folder+'2_VisionMissionValues.pdf',decompress=False).pages
p = pagexobj(page[0])
c.setPageSize([11*inch, 8.5*inch]) #Set page size (for landscape)
c.doForm(makerl(c, p))
c.showPage()
c.save()

Thanks in advance!

回答1:

Sooo...

On the one hand, I have absolutely no idea why this is happening, and not really much time to debug it right now.

On the other hand, I have a workaround for you (and I tried the workaround on v0.3, as well as on the current github master, and it worked in both cases for me).

I started off by verifying that your code failed on your page and that it worked on another PDF. Then I asked myself "What happens if I use my watermark example to create a PDF with your page as a watermark?" (because that uses some of the same form XObject code). That worked, so then I asked myself "What does it look like if I pass my watermarked page through your reportlab code?"

Interestingly, the entire watermarked page, including your image made it through. So I modified your code to do the minimal stuff that the watermark does, which winds up putting a form XObject inside a form XObject when it's passed to reportlab. That worked.

Here's a slightly modified version of your code that I used for this.

import sys

from reportlab.pdfgen import canvas
from pdfrw import PdfReader, PageMerge
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl

inch = 72

fname, = sys.argv[1:]
page = PdfReader(fname,decompress=False).pages[0]
p = pagexobj(PageMerge().add(page).render())

c = canvas.Canvas('outstuff.pdf')
c.setPageSize([8.5*inch, 11.0*inch]) #Set page size (for portrait)
c.doForm(makerl(c, p))
c.showPage()
c.save()

来源：https://stackoverflow.com/questions/43773477/reportlab-and-pdfrw-importing-scanned-pdf

标签

reportlab

pdf-reader

pdfrw