问题
I've been trying all morning to add page numbers to a pdf document, but I can't figure it out. I'd like to use python, with pyPdf or reportlab.
Does anyone have any ideas?
回答1:
Here is my Python code to Add Page Number to PDF file. I have used both pyPdf2 and reportlab.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
helpDoc = '''
Add Page Number to PDF file with Python
Python 给 PDF 添加 页码
usage:
python addPageNumberToPDF.py [PDF path]
require:
pip install reportlab pypdf2
Support both Python2/3, But more recommend Python3
tips:
* output file will save at pdfWithNumbers/[PDF path]_page.pdf
* only support A4 size PDF
* tested on Python2/Python3@ubuntu
* more large size of PDF require more RAM
* if segmentation fault, plaese try use Python 3
* if generate PDF document is damaged, plaese try use Python 3
Author:
Lei Yang (ylxx@live.com)
GitHub:
https://gist.github.com/DIYer22/b9ede6b5b96109788a47973649645c1f
'''
print(helpDoc)
import reportlab
from reportlab.lib.units import mm
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader
def createPagePdf(num, tmp):
c = canvas.Canvas(tmp)
for i in range(1,num+1):
c.drawString((210//2)*mm, (4)*mm, str(i))
c.showPage()
c.save()
return
with open(tmp, 'rb') as f:
pdf = PdfFileReader(f)
layer = pdf.getPage(0)
return layer
if __name__ == "__main__":
pass
import sys,os
path = 'MLDS17f.pdf'
# path = '1.pdf'
if len(sys.argv) == 1:
if not os.path.isfile(path):
sys.exit(1)
else:
path = sys.argv[1]
base = os.path.basename(path)
tmp = "__tmp.pdf"
batch = 10
batch = 0
output = PdfFileWriter()
with open(path, 'rb') as f:
pdf = PdfFileReader(f,strict=False)
n = pdf.getNumPages()
if batch == 0:
batch = -n
createPagePdf(n,tmp)
if not os.path.isdir('pdfWithNumbers/'):
os.mkdir('pdfWithNumbers/')
with open(tmp, 'rb') as ftmp:
numberPdf = PdfFileReader(ftmp)
for p in range(n):
if not p%batch and p:
newpath = path.replace(base, 'pdfWithNumbers/'+ base[:-4] + '_page_%d'%(p//batch) + path[-4:])
with open(newpath, 'wb') as f:
output.write(f)
output = PdfFileWriter()
# sys.stdout.write('\rpage: %d of %d'%(p, n))
print('page: %d of %d'%(p, n))
page = pdf.getPage(p)
numberLayer = numberPdf.getPage(p)
page.mergePage(numberLayer)
output.addPage(page)
if output.getNumPages():
newpath = path.replace(base, 'pdfWithNumbers/' + base[:-4] + '_page_%d'%(p//batch + 1) + path[-4:])
with open(newpath, 'wb') as f:
output.write(f)
os.remove(tmp)
check the latest version at my GitHub gist
回答2:
What you want to do is very similar to watermarking a PDF, except that you are placing a different watermark on every page.
pdfrw (Disclaimer: I am the author.) will do the watermark function (and has a watermark example). You could use reportlab to programmatically create a PDF that only has the page numbering you want -- one for each page in the destination document, and then use pdfrw to overlay each page of that document on top of your original document. When you use pdfrw, you may want to reuse the original PDF trailer in order to keep bookmarks, etc. If you look at the pdfrw watermark example it will show you how to do this.
Since these are both python, you could use them from the same program and (for example) use pdfrw to figure out how many pages you need to generate from reportlab.
来源:https://stackoverflow.com/questions/31291282/how-to-add-page-number-to-a-pdf-file