I am trying to use pytesseract in Python but I always end up with the following error:
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNo
Under Windows 10 OS environment, the following method works for me:
https://github.com/tesseract-ocr/tesseract/wiki Download tesseract and install it. Windows version is available here: https://github.com/UB-Mannheim/tesseract/wiki
Find script file pytesseract.py from C:\Users\User\Anaconda3\Lib\site-packages\pytesseract and open it.
Change the following code from tesseract_cmd = 'tesseract'
to: tesseract_cmd = 'D:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
You may also need add environment variable D:/Program Files (x86)/Tesseract-OCR/
Hope it works for you!
The following three commands will do the needful :
sudo apt update
# This will update your packages
sudo apt install tesseract-ocr
# This will install OCR
sudo apt install libtesseract-dev
# This will add it as development dependency
I tried adding to the path variable like others have mentioned, but still received the same error. what worked was adding this to my script:
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
You are probably missing tesseract-ocr
from your machine. Check the installation instructions here: https://github.com/tesseract-ocr/tesseract/wiki
On a Mac, you can just install using homebrew:
brew install tesseract
It should run fine after that
I face this same issue. I just use this command that will help me.
sudo apt install tesseract-ocr
Note that this will only work on Ubuntu.
sudo
is a Unix exclusive command (Linux, Mac, Rasbian, etc.) while apt
is Ubuntu specific.
One simple thing that actually worked for me in Jupyter Notebook, was using double backslash instead of a single backslash in the pytesseract.pytesseract.tesseract_cmd path:
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'