问题
I am trying to use pytesseract in Python but I always end up with the following error:
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path
However, pytesseract and Tesseract are installed on my system.
Example code that produces this error:
import cv2
import pytesseract
img = cv2.imread('1d.png')
print(pytesseract.image_to_string(img))
How do I resolve this TesseractNotFoundError?
回答1:
I tried adding to the path variable like others have mentioned, but still received the same error. what worked was adding this to my script:
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
回答2:
I got this error because I installed pytesseract
with pip
but forget to install the binary.
On Linux
sudo apt update
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
On Mac
brew install tesseract
On Windows
download binary from https://github.com/UB-Mannheim/tesseract/wiki. then add pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
to your script. (replace path of tesseract binary if necessary)
references: https://pypi.org/project/pytesseract/ (INSTALLATION section) and https://github.com/tesseract-ocr/tesseract/wiki#installation
回答3:
You are probably missing tesseract-ocr
from your machine. Check the installation instructions here: https://github.com/tesseract-ocr/tesseract/wiki
On a Mac, you can just install using homebrew:
brew install tesseract
It should run fine after that
回答4:
Under Windows 10 OS environment, the following method works for me:
https://github.com/tesseract-ocr/tesseract/wiki Download tesseract and install it. Windows version is available here: https://github.com/UB-Mannheim/tesseract/wiki
Find script file pytesseract.py from C:\Users\User\Anaconda3\Lib\site-packages\pytesseract and open it. Change the following code from
tesseract_cmd = 'tesseract'
to:tesseract_cmd = 'D:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
You may also need add environment variable
D:/Program Files (x86)/Tesseract-OCR/
Hope it works for you!
回答5:
One simple thing that actually worked for me in Jupyter Notebook, was using double backslash instead of a single backslash in the pytesseract.pytesseract.tesseract_cmd path:
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
回答6:
I'm running on a Mac OS and installed tesseract with brew so here's my take on this. Since pytesseract is just how you can access tesseract from python, you have to specify where tesseract is already on your computer.
For Mac OS
Try finding where the tesseract.exe is- if you installed it using brew, on your the terminal use:
>brew list tesseract
This should list where your tesseract.exe is, somewhere more or less like
> /usr/local/Cellar/tesseract/3.05.02/bin/tesseract
Then following their instructions:
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'
pytesseract.pytesseract.tesseract_cmd = r'/usr/local/Cellar/tesseract/3.05.02/bin/tesseract'
should do the trick!
回答7:
I face this same issue. I just use this command that will help me.
sudo apt install tesseract-ocr
Note that this will only work on Ubuntu.sudo
is a Unix exclusive command (Linux, Mac, Rasbian, etc.) while apt
is Ubuntu specific.
回答8:
I faced the same problem. I hope you have installed from here and have also done pip install pytesseract
.
If everything is fine you should see that the path C:\Program Files (x86)\Tesseract-OCR where tesseract.exe
is available.
Adding Path variable did not helped me, I actually added new variable with name tesseract
in environment variables with a value of C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
.
Typing tesseract
in the command line should now work as expected by giving you usage informations. You can now use pytesseract
as such (don't forget to restart your python kernel before running this!):
import pytesseract
from PIL import Image
value=Image.open("text_image.png")
text = pytesseract.image_to_string(value, config='')
print("text present in images:",text)
enjoy!
回答9:
For Mac:
- Install Pytesseract (pip install pytesseract should work)
- Install Tesseract but only with homebrew, pip installation somehow doesn't work. (brew install tesseract)
- Get the path of brew installation of Tesseract on your device (brew list tesseract)
- Add the path into your code, not in sys path. The path is to be added along with code, using pytesseract.pytesseract.tesseract_cmd = '<path received in step 3>' - (e.g. pytesseract.pytesseract.tesseract_cmd = '/usr/local/Cellar/tesseract/4.0.0_1/bin/tesseract')
This should work fine.
回答10:
You can download tesseract-ocr setup using the following link,
Tesseract for windows
Then add new variable with name tesseract in environment variables with value C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
回答11:
The following three commands will do the needful :
sudo apt update
# This will update your packages
sudo apt install tesseract-ocr
# This will install OCR
sudo apt install libtesseract-dev
# This will add it as development dependency
回答12:
CAUTION: ONLY FOR WINDOWS
I came across this problem today and all the answers mentioned here helped me, but I personally had to dig a lot to solve it. So let me help all others by putting out the solution to it in a very simple form:
Download the executable 64 bit (32-bit if your computer is of 32 bit) exe from here.
(Name of the file would be tesseract-ocr-w64-setup-v5.0.0.20190526 (alpha))
Install it. Let it install itself in the default C directory.
Now go to your Environment variable (Reach there by just searching it in the start menu or Go to
Control Panel > System > Advanced System Settings > Environment Variables
)a) Select PATH and then Edit it. Click on NEW and add the path where it is installed (Usually
C:\Program Files\Tesseract-OCR\
)
Now you will not get the error!
回答13:
Install tesseract from https://github.com/UB-Mannheim/tesseract/wiki and add the path of tesseract.exe to the Path environment variable.
回答14:
Most likely you have different versions of Python installed, ensure that the installed Tesseract is on the same Python version.
which pip3
shows you the path to the pip3 installation and which python3
shows the corresponding path to the Python installation.
Ensure that these two are the same.
回答15:
I was also facing the same error when I was trying to make a text-extractor using pytesseract, but the solution was there in installation instructions for pytesseract in pypi site: pytesseract There are many alternatives to avoid the error, But, adding one more parameter in the method pytesseract.image_to_string solved it for me, like
tessdata_dir_config = "/usr/share/tesseract-ocr/4.00/tessdata"
output = pytesseract.image_to_string(image, lang='eng', config=tessdata_dir_config)
回答16:
This occurs under windows (at least in tesseract version 3.05) when the current directory is on a different drive from where tesseract is installed.
Something in tesseract is expecting data files to be in \Program Files... (rather than C:\Program Files, say). So if you're not on the same drive letter as tesseract, it will fail. It would be great if we could work around it by temporarily changing drives (under windows only) to the tesseract installation drive before executing tesseract, and changing back after. Example in your case: You can copy yourmodule_python.py to "C/Program Files (x86)/Tesseract-OCR/" and RUN!
回答17:
There Are few steps to set the path
1:goto this "https://github.com/UB-Mannheim/tesseract/wiki"
2:download the latest installers
3:install it
4: set the path in system variables such as "C:\Program Files\Tesseract-OCR" or "C:\ProgramFiles (x86)\Tesseract-OCR"
5 : open CMD type "tesseract" and some output except "not regonized type errors"
回答18:
Are you importing
from tesseract import image_to_string
Don't import from pytesseract
回答19:
I was also facing the same issue, just add C:\Program Files (x86)\Tesseract-OCR
to your path variable.
If it still does not work, add C:\Program Files (x86)\Tesseract-OCR\tessdata
to your path variable in a new line. And do not forget to restart your computer after adding the path variable.
回答20:
I'm currently using Windows and needed to develop a PDF parser but adding a new environment variable via sysdm.cpl
alone did not work. For other Windows user, I strongly suggest adding C:\Program Files (x86)\Tesseract-OCR
to your profile.ps1
as well (if using Powershell that is).
回答21:
Small mistake -- I knew I had to open/close my cmd to get the updated path to reflect. Using Jupyter Notebook I had to shutdown the client and re-initialize it also.
来源:https://stackoverflow.com/questions/50655738/how-do-i-resolve-a-tesseractnotfounderror