Convert scanned pdf to text python

后端未结

关注

 5  1944

再見小時候 2021-02-02 01:40

I have a scanned pdf file and I try to extract text from it. I tried to use pypdfocr to make ocr on it but I have error:

\"could not found ghostscript in

5条回答

南方客 (楼主)

2021-02-02 02:27

PyPDF2 is a python library built as a PDF toolkit. It is capable of:

Extracting document information (title, author, …)
Splitting documents page by page
Merging documents page by page
Cropping pages
Merging multiple pages into a single page
Encrypting and decrypting PDF files
and more!

To install PyPDF2, run following command from command line:

pip install PyPDF2

CODE:

import PyPDF2 

pdfFileObj = open('myPdf.pdf', 'rb') 


pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

print(pdfReader.numPages) 

pageObj = pdfReader.getPage(0) 

print(pageObj.extractText()) 

pdfFileObj.close()

0 讨论(0)

查看其它5个回答