I have a scanned pdf file and I try to extract text from it. I tried to use pypdfocr to make ocr on it but I have error:
\"could not found ghostscript in
PyPDF2 is a python library built as a PDF toolkit. It is capable of:
Extracting document information (title, author, …)
Splitting documents page by page
Merging documents page by page
Cropping pages
Merging multiple pages into a single page
Encrypting and decrypting PDF files
and more!
To install PyPDF2, run following command from command line:
pip install PyPDF2
CODE:
import PyPDF2
pdfFileObj = open('myPdf.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print(pdfReader.numPages)
pageObj = pdfReader.getPage(0)
print(pageObj.extractText())
pdfFileObj.close()