python-tesseract

Pytesseract is too slow. How can I make it process images faster?

佐手、 提交于 2019-12-24 00:22:26
问题 I am using pytesseract in the below code: def fnd(): for fname in list: x = None x = np.array([np.array(PIL.Image.open(fname))]) print x.size for im in x: txt = pytesseract.image_to_string(image=im).encode('utf-8').strip() open("Output.txt","a+").write(txt) with open("Output.txt") as openfile: for line in openfile: for part in line.split(): if "cyber" in part.lower(): print(line) return The list contains names of images from a folder (2408*3506 & 300 res Gray-scaled). Unfortunately for around

pytesseract error Windows Error [Error 2]

妖精的绣舞 提交于 2019-12-23 20:41:51
问题 Hi I am trying the python library pytesseract to extract text from image. Please find the code: from PIL import Image from pytesseract import image_to_string print image_to_string(Image.open(r'D:\new_folder\img.png')) But the following error came: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string config=config) File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line

Emacs python not able to find package/module

安稳与你 提交于 2019-12-23 05:32:17
问题 Problem My tesseract (tesserocr) is not found by the emacs python interpreter, but I am able to use tesseract on the terminal as well as in my Spyder installation. Emacs python interpreter is able to import pytesseract, but not find tesserocr. I get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/eghx/agent18/project-gym/tests/thresholding.py", line 34, in image_to_string2 print(image_to_string(img_open)) File "/home/eghx/anaconda3/lib

How to process and extract text from image

偶尔善良 提交于 2019-12-22 17:37:03
问题 I'm trying to extract text from image using python cv2. The result is pathetic and I can't figure out a way to improve my code. I believe the image needs to be processed before the extraction of text but not sure how. I've tried to convert it into black and white but no luck. import cv2 import os import pytesseract from PIL import Image import time pytesseract.pytesseract.tesseract_cmd='C:\\Program Files\\Tesseract-OCR\\tesseract.exe' cam = cv2.VideoCapture(1,cv2.CAP_DSHOW) cam.set(cv2.CAP

How to process and extract text from image

China☆狼群 提交于 2019-12-22 17:36:49
问题 I'm trying to extract text from image using python cv2. The result is pathetic and I can't figure out a way to improve my code. I believe the image needs to be processed before the extraction of text but not sure how. I've tried to convert it into black and white but no luck. import cv2 import os import pytesseract from PIL import Image import time pytesseract.pytesseract.tesseract_cmd='C:\\Program Files\\Tesseract-OCR\\tesseract.exe' cam = cv2.VideoCapture(1,cv2.CAP_DSHOW) cam.set(cv2.CAP

Preserving Spaces in Tesseract

醉酒当歌 提交于 2019-12-22 10:34:06
问题 I had an image file, which contain some text separated by tabs (2 spaces). But when I extract text out of this image file, I always get a single space between two columns. A sample example: IMAGE: col-a col-b col-c Desired output: col-a col-b col-c But I am getting the following: col-a col-b col-c I am using pytesseract.image_to_string (Python module) convert image to text 回答1: Use it like this: pytesseract.image_to_string(img, config='-c preserve_interword_spaces=1') 来源: https:/

Why can't get string with PIL and pytesseract?

笑着哭i 提交于 2019-12-22 04:44:06
问题 It is a simple Optical Character Recognition (OCR) program in Python 3 to get string, I have uploaded the target gif file here, please download it and save it as /tmp/target.gif . try: from PIL import Image except ImportError: import Image import pytesseract print(pytesseract.image_to_string(Image.open('/tmp/target.gif'))) I paste all the error info here, please fix it to get the characters from image. /usr/lib/python3/dist-packages/PIL/Image.py:925: UserWarning: Couldn't allocate palette

get Font Size in Python with Tesseract and Pyocr

三世轮回 提交于 2019-12-21 21:18:27
问题 Is it possible to get font size from an image using pyocr or Tesseract ? Below is my code. tools = pyocr.get_available_tools() tool = tools[0] txt = tool.image_to_string( Imagee.open(io.BytesIO(req_image)), lang=lang, builder=pyocr.builders.TextBuilder() ) Here i get text from image using function image_to_string . And now, my question is, if i can get font-size (number) too of my text. 回答1: Using tesserocr, you can get a ResultIterator after calling Recognize on your image, for which you can

How to use trained data with pytesseract?

时间秒杀一切 提交于 2019-12-21 19:44:16
问题 Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata Right now I'm using this simple script : try: import Image except ImportError: from PIL import Image import pytesseract as tes results = tes.image_to_string(Image.open('./test.jpg'),boxes=True) file = open('parsing.text','a') file.write(results) print(results) How to I use my traineddata file so I'm able to read new font with the python script

How to save dpi info in py-opencv?

蓝咒 提交于 2019-12-20 03:52:10
问题 import cv2 def clear(img): back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE) img = cv2.bitwise_xor(img, back) ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) return img def threshold(img): ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY) return img def fomatImage(img): img = threshold(img) img = clear(img) return img img = fomatImage(cv2.imread("1566135246468