问题
I am trying to use tesseract-OCR to print text from the image. But I am getting the above error. I have installed tesseract OCR using https://github.com/UB-Mannheim/tesseract/wiki and pytesseract in the anaconda prompt using pip install pytesseract but its not working. Please help if anyone has faced the similar issue.
(base) C:\Users\500066016>pip install pytesseract Collecting pytesseract Downloading https://files.pythonhosted.org/packages/13/56/befaafbabb36c03e4fdbb3fea854e0aea294039308a93daf6876bf7a8d6b/pytesseract-0.2.4.tar.gz (169kB) 100% |████████████████████████████████| 174kB 288kB/s Requirement already satisfied: Pillow in c:\users\500066016\appdata\local\continuum\anaconda3\lib\site-packages (from pytesseract) (5.1.0) Building wheels for collected packages: pytesseract Running setup.py bdist_wheel for pytesseract ... done Stored in directory: C:\Users\500066016\AppData\Local\pip\Cache\wheels\a8\0c\00\32e4957a46128bea34fda60b8b01a8755986415cbab3ed8e38 Successfully built pytesseract
Below is the code:
import pytesseract
import cv2
import numpy as np
def get_string(img_path):
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1,1), np.uint8)
dilate = cv2.dilate(img, kernel, iterations=1)
erosion = cv2.erode(img, kernel, iterations=1)
cv2.imwrite('removed_noise.jpg', img)
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
cv2.imwrite('thresh.jpg', img)
res = pytesseract.image_to_string('thesh.jpg')
return res
print('Getting string from the image')
print(get_string('quotes.jpg'))
Below is the error:
Traceback (most recent call last):
File "", line 1, in runfile('C:/Users/500066016/.spyder-py3/project1.py', wdir='C:/Users/500066016/.spyder-py3')
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/500066016/.spyder-py3/project1.py", line 23, in print(get_string('quotes.jpg'))
File "C:/Users/500066016/.spyder-py3/project1.py", line 20, in get_string res = pytesseract.image_to_string('thesh.jpg')
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 294, in image_to_string return run_and_get_output(*args)
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 202, in run_and_get_output run_tesseract(**kwargs)
File "C:\Users\500066016\AppData\Local\Continuum\anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 172, in run_tesseract raise TesseractNotFoundError()
TesseractNotFoundError: tesseract is not installed or it's not in your path
回答1:
Step 1: Download and install Tesseract OCR from this link.
Step 2: After installing find the "Tesseract-OCR" folder, double Click on this folder and find the tesseract.exe.
Step 3: After finding the tesseract.exe, copy the file location.
Step 4: Pass this location into your code like this
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
Note: C:\Program Files\Tesseract-OCR\tesseract.exe == your copied location
回答2:
You should to install : ! apt install tesseract-ocr ! apt install libtesseract-dev
And
! pip install Pillow ! pip install pytesseract
import pytesseract from PIL import ImageEnhance, ImageFilter, Image
I have code on Cola from google drive to run. Below of my example code:
I took any example picture of text on website
Step 1: import some packages
import pytesseract
import cv2
import matplotlib.pyplot as plt
from PIL import Image
Step 2 : Upload file of text.png on Colab
from google.colab import files
uploaded = files.upload()
current browser session. Please rerun this cell to enable.
---------------------------------------------------------------------------
MessageError Traceback (most recent call last)
<ipython-input-31-21dc3c638f66> in <module>()
1 from google.colab import files
----> 2 uploaded = files.upload()
2 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
104 reply.get('colab_msg_id') == message_id):
105 if 'error' in reply:
--> 106 raise MessageError(reply['error'])
107 return reply.get('data', None)
108
MessageError: TypeError: Cannot read property '_uploadFiles' of undefined
-> Don't worry, please run code again it will accept it. And then, you could choose which if you want to upload
Step 3 :
read the image using OpenCV
image = cv2.imread("text.png")
or you can use Pillow
image = Image.open("text.png")
check it. Have they show file text picture.
image
get the string
string = pytesseract.image_to_string(image)
print it
print(string)
Done. Helpful you..
回答3:
it is clear from the error that your system is unable to find tesseract package if you are on windows simply run following command in your command prompt.
pip install tesseract
hope it will solve your problem :)
来源:https://stackoverflow.com/questions/51677283/tesseractnotfounderror-tesseract-is-not-installed-or-its-not-in-your-path