Convert scanned pdf to text python

后端 未结 5 1944
再見小時候
再見小時候 2021-02-02 01:40

I have a scanned pdf file and I try to extract text from it. I tried to use pypdfocr to make ocr on it but I have error:

\"could not found ghostscript in

5条回答
  •  南方客
    南方客 (楼主)
    2021-02-02 02:27

    PyPDF2 is a python library built as a PDF toolkit. It is capable of:

    Extracting document information (title, author, …)
    Splitting documents page by page
    Merging documents page by page
    Cropping pages
    Merging multiple pages into a single page
    Encrypting and decrypting PDF files
    and more!
    

    To install PyPDF2, run following command from command line:

    pip install PyPDF2
    

    CODE:

    import PyPDF2 
    
    pdfFileObj = open('myPdf.pdf', 'rb') 
    
    
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
    
    print(pdfReader.numPages) 
    
    pageObj = pdfReader.getPage(0) 
    
    print(pageObj.extractText()) 
    
    pdfFileObj.close() 
    

提交回复
热议问题