PyPDF2 hangs on processing

情到浓时终转凉″ 提交于 2019-12-24 07:13:52

问题


I'm processing multiple pdf files using PyPDF2 but my script hangs somewhere. All I can see in my console is some "startxref on same line as offset" which I'm correct is a warning so by right it should still go to the finally block and return an empty string.

Am I doing something wrong?

import PyPDF2
import sys
import os
def decode_pdf(src_filename):           
    out_str=""
    try:
        f = open(str(src_filename), "rb")           
        read_pdf = PyPDF2.PdfFileReader(f)
        number_of_pages = read_pdf.getNumPages()
        for i in range(0,number_of_pages):
            page = read_pdf.getPage(i)
            out_str = out_str + " " + page.extractText()
        out_str = ''.join(out_str.splitlines())
        f.close()
    except:
        print("Exception on pdf")
        print(sys.exc_info())
        out_str = ""
    finally:
        return out_str

回答1:


I was facing this issue too and couldn't solve it using PyPDF2. I solved mine with pdfminer using the example from here

Copying the relevant code here below

from cStringIO import StringIO
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

def convert(fname, pages=None):
    if not pages:
        pagenums = set()
    else:
        pagenums = set(pages)

    output = StringIO()
    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams=LAParams())
    interpreter = PDFPageInterpreter(manager, converter)

    infile = file(fname, 'rb')
    for page in PDFPage.get_pages(infile, pagenums):
        interpreter.process_page(page)
    infile.close()
    converter.close()
    text = output.getvalue()
    output.close
    return text 

call the function convert() as below

convert('myfile.pdf', pages=[5,7])


来源:https://stackoverflow.com/questions/45575468/pypdf2-hangs-on-processing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!