Convert scanned pdf to text python

后端 未结 5 1959
再見小時候
再見小時候 2021-02-02 01:40

I have a scanned pdf file and I try to extract text from it. I tried to use pypdfocr to make ocr on it but I have error:

\"could not found ghostscript in

5条回答
  •  面向向阳花
    2021-02-02 02:32

    Take a look at this library: https://pypi.python.org/pypi/pypdfocr but a PDF file can have also images in it. You may be able to analyse the page content streams. Some scanners break up the single scanned page into images, so you won't get the text with ghostscript.

提交回复
热议问题