PyPDF2 split pdf by pages

女生的网名这么多〃 提交于 2019-12-22 06:33:02

问题


I wanna split pdf file using PyPDF2.

All examples in net is too difficult or don't work or always give error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'"

Can someone help with it ? Need separete one pdf with 3 pages into three different files.

I'm starting from that:

pdfFileObj = open(r"D:\BPO\act.pdf", 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfWriter = PyPDF2.PdfFileWriter()
pdfWriter.addPage(pdfReader.getPage(0))

But don't know what to do next :(

EDIT#1

Was try do a loop for spliting and i'm have a problem: PdfFileWriter make 3 files one with one page, second - with two, and third with three. Where is my mistake in following code:

act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']
with open(r"D:\BPO\act.pdf", 'rb') as act_mls:
    reader = PdfFileReader(act_mls)
    writer = PdfFileWriter()
    if reader.numPages == 3:
        counter = 0
        for x in range(3):
            path = '\\'.join(['D:\\BPO\\act sub pages', act_sub_pages_name[counter]])
            counter += 1
            writer.addPage(reader.getPage(x))
            with open(path, 'wb') as outfile: writer.write(outfile)

Sry for bad English.

EDIT#2

My solution according by Paul Rooney answer:

act_pdf_file = 'D:\\BPO\\act.pdf'
act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']

def pdf_splitter(index, src_file):
    with open(src_file, 'rb') as act_mls:
        reader = PdfFileReader(act_mls)
        writer = PdfFileWriter()
        writer.addPage(reader.getPage(index))
        out_file = os.path.join('D:\\BPO\\act sub pages', act_sub_pages_name[index])
        with open(out_file, 'wb') as out_pdf: writer.write(out_pdf)

for x in range(3): pdf_splitter(x, act_pdf_file)

With function all works properly but it a little bit harder.


回答1:


You can use the write method of the PdfFileWriter to write out to the file.

from PyPDF2 import PdfFileReader, PdfFileWriter

with open("input.pdf", 'rb') as infile:

    reader = PdfFileReader(infile)
    writer = PdfFileWriter()
    writer.addPage(reader.getPage(0))

    with open('output.pdf', 'wb') as outfile:
        writer.write(outfile)

You may want to loop over the pages of the input file, create a new writer object, add a single page. Then write out to an ever incrementing filename or have some other scheme for deciding output filename?




回答2:


I've used a tool called xpdf for just this sort of task and it works really really well. You can download it here.

It's a command line utility that you can call from python. Make sure it's added to your path so you can call it from the command line.

Here's how you can interface it from python, using subprocess:

import subprocess

text, _ = subprocess.Popen('pdftotext -fixed 0 -clip D:\\BPO\\act.pdf', 
                           shell=True, 
                           stdout=subprocess.PIPE).communicate()

pages = text.decode('latin-1').split('\f')

Pages are separated by formfeed characters, so you'll get a list of pages.



来源:https://stackoverflow.com/questions/45144206/pypdf2-split-pdf-by-pages

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!