PyPDF2 compression

爷,独闯天下 提交于 2020-01-04 15:29:17

问题


I am struggling to compress my merged pdf's using the PyPDF2 module. this is my attempt based on http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/

import PyPDF2
path = open('path/to/hello.pdf', 'rb')
path2 = open('path/to/another.pdf', 'rb')
merger = PyPDF2.PdfFileMerger()
merger.append(fileobj=path2)
merger.append(fileobj=path)
pdf.filters.compress(merger)
merger.write(open("test_out2.pdf", 'wb'))

The error I receive is

TypeError: must be string or read-only buffer, not file

I have also tried to compressing the pdf after the merging is complete. I am basing my failed compression on what file size I got after using PDFSAM with compression. Any thoughts? Thanks.


回答1:


PyPDF2 doesn't have a reliable compression method. That said, there's a compressContentStreams() method with the following description:

Compresses the size of this page by joining all content streams and applying a FlateDecode filter.

However, it is possible that this function will perform no action if content stream compression becomes "automatic" for some reason.

Again, this won't make any difference in most cases but you can try this code:

import PyPDF2

path = 'path/to/hello.pdf'
path2 = 'path/to/another.pdf'
pdfs = [path, path2]

writer = PyPDF2.PdfFileWriter()

for pdf in pdfs:
    reader = PyPDF2.PdfFileReader(pdf)
    for i in xrange(reader.numPages):
        page = reader.getPage(i)
        page.compressContentStreams()
        writer.addPage(page)

with open('test_out2.pdf', 'wb') as f:
    writer.write(f)


来源:https://stackoverflow.com/questions/22776388/pypdf2-compression

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!