pyPdf unable to extract text from some pages in my PDF

后端 未结 6 1058
伪装坚强ぢ
伪装坚强ぢ 2021-01-05 13:07

I\'m trying to use pyPdf to extract and print pages from a multipage PDF. Problem is, text is not extracted from some pages. I\'ve put an example file here:

http://w

6条回答
  •  悲&欢浪女
    2021-01-05 13:36

    I had similar problem with some pdfs and for windows, this is working excellent for me:

    1.- Download Xpdf tools for windows

    2.- copy pdftotext.exe from xpdf-tools-win-4.00\bin32 to C:\Windows\System32 and also to C:\Windows\SysWOW64

    3.- use subprocess to run command from console:

    import subprocess
    
    try:
        extInfo = subprocess.check_output('pdftotext.exe '+filePath + ' -',shell=True,stderr=subprocess.STDOUT).strip()
    except Exception as e:
        print (e) 
    

提交回复
热议问题