pyPdf unable to extract text from some pages in my PDF

后端未结

关注

 6  1058

伪装坚强ぢ 2021-01-05 13:07

I\'m trying to use pyPdf to extract and print pages from a multipage PDF. Problem is, text is not extracted from some pages. I\'ve put an example file here:

http://w

6条回答

悲&欢浪女 (楼主)

2021-01-05 13:36
I had similar problem with some pdfs and for windows, this is working excellent for me:

1.- Download Xpdf tools for windows

2.- copy pdftotext.exe from xpdf-tools-win-4.00\bin32 to C:\Windows\System32 and also to C:\Windows\SysWOW64

3.- use subprocess to run command from console:
```
import subprocess

try:
    extInfo = subprocess.check_output('pdftotext.exe '+filePath + ' -',shell=True,stderr=subprocess.STDOUT).strip()
except Exception as e:
    print (e) 
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...