I am trying to use pytesseract in Python but I always end up with the following error:
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNo
You can download tesseract-ocr setup using the following link,
Tesseract for windows
Then add new variable with name tesseract in environment variables with value C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
I'm currently using Windows and needed to develop a PDF parser but adding a new environment variable via sysdm.cpl
alone did not work. For other Windows user, I strongly suggest adding C:\Program Files (x86)\Tesseract-OCR
to your profile.ps1
as well (if using Powershell that is).
I got this error because I installed pytesseract
with pip
but forget to install the binary.
sudo apt update
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
brew install tesseract
download binary from https://github.com/UB-Mannheim/tesseract/wiki. then add pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
to your script. (replace path of tesseract binary if necessary)
references: https://pypi.org/project/pytesseract/ (INSTALLATION section) and https://github.com/tesseract-ocr/tesseract/wiki#installation
I'm running on a Mac OS and installed tesseract with brew so here's my take on this. Since pytesseract is just how you can access tesseract from python, you have to specify where tesseract is already on your computer.
For Mac OS
Try finding where the tesseract.exe is- if you installed it using brew, on your the terminal use:
>brew list tesseract
This should list where your tesseract.exe is, somewhere more or less like
> /usr/local/Cellar/tesseract/3.05.02/bin/tesseract
Then following their instructions:
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'
pytesseract.pytesseract.tesseract_cmd = r'/usr/local/Cellar/tesseract/3.05.02/bin/tesseract'
should do the trick!
Most likely you have different versions of Python installed, ensure that the installed Tesseract is on the same Python version.
which pip3
shows you the path to the pip3 installation and which python3
shows the corresponding path to the Python installation.
Ensure that these two are the same.
This occurs under windows (at least in tesseract version 3.05) when the current directory is on a different drive from where tesseract is installed.
Something in tesseract is expecting data files to be in \Program Files... (rather than C:\Program Files, say). So if you're not on the same drive letter as tesseract, it will fail. It would be great if we could work around it by temporarily changing drives (under windows only) to the tesseract installation drive before executing tesseract, and changing back after. Example in your case: You can copy yourmodule_python.py to "C/Program Files (x86)/Tesseract-OCR/" and RUN!