python-tesseract | 易学教程

Pytesseract is too slow. How can I make it process images faster?

阅读更多关于 Pytesseract is too slow. How can I make it process images faster?

问题 I am using pytesseract in the below code: def fnd(): for fname in list: x = None x = np.array([np.array(PIL.Image.open(fname))]) print x.size for im in x: txt = pytesseract.image_to_string(image=im).encode('utf-8').strip() open("Output.txt","a+").write(txt) with open("Output.txt") as openfile: for line in openfile: for part in line.split(): if "cyber" in part.lower(): print(line) return The list contains names of images from a folder (2408*3506 & 300 res Gray-scaled). Unfortunately for around

pytesseract error Windows Error [Error 2]

阅读更多关于 pytesseract error Windows Error [Error 2]

问题 Hi I am trying the python library pytesseract to extract text from image. Please find the code: from PIL import Image from pytesseract import image_to_string print image_to_string(Image.open(r'D:\new_folder\img.png')) But the following error came: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string config=config) File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line

Emacs python not able to find package/module

阅读更多关于 Emacs python not able to find package/module

问题 Problem My tesseract (tesserocr) is not found by the emacs python interpreter, but I am able to use tesseract on the terminal as well as in my Spyder installation. Emacs python interpreter is able to import pytesseract, but not find tesserocr. I get the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/eghx/agent18/project-gym/tests/thresholding.py", line 34, in image_to_string2 print(image_to_string(img_open)) File "/home/eghx/anaconda3/lib

How to process and extract text from image

阅读更多关于 How to process and extract text from image

问题 I'm trying to extract text from image using python cv2. The result is pathetic and I can't figure out a way to improve my code. I believe the image needs to be processed before the extraction of text but not sure how. I've tried to convert it into black and white but no luck. import cv2 import os import pytesseract from PIL import Image import time pytesseract.pytesseract.tesseract_cmd='C:\\Program Files\\Tesseract-OCR\\tesseract.exe' cam = cv2.VideoCapture(1,cv2.CAP_DSHOW) cam.set(cv2.CAP

How to process and extract text from image

阅读更多关于 How to process and extract text from image

Preserving Spaces in Tesseract

阅读更多关于 Preserving Spaces in Tesseract

问题 I had an image file, which contain some text separated by tabs (2 spaces). But when I extract text out of this image file, I always get a single space between two columns. A sample example: IMAGE: col-a col-b col-c Desired output: col-a col-b col-c But I am getting the following: col-a col-b col-c I am using pytesseract.image_to_string (Python module) convert image to text 回答1: Use it like this: pytesseract.image_to_string(img, config='-c preserve_interword_spaces=1') 来源： https:/

Why can't get string with PIL and pytesseract?

阅读更多关于 Why can't get string with PIL and pytesseract?

问题 It is a simple Optical Character Recognition (OCR) program in Python 3 to get string, I have uploaded the target gif file here, please download it and save it as /tmp/target.gif . try: from PIL import Image except ImportError: import Image import pytesseract print(pytesseract.image_to_string(Image.open('/tmp/target.gif'))) I paste all the error info here, please fix it to get the characters from image. /usr/lib/python3/dist-packages/PIL/Image.py:925: UserWarning: Couldn't allocate palette

get Font Size in Python with Tesseract and Pyocr

阅读更多关于 get Font Size in Python with Tesseract and Pyocr

问题 Is it possible to get font size from an image using pyocr or Tesseract ? Below is my code. tools = pyocr.get_available_tools() tool = tools[0] txt = tool.image_to_string( Imagee.open(io.BytesIO(req_image)), lang=lang, builder=pyocr.builders.TextBuilder() ) Here i get text from image using function image_to_string . And now, my question is, if i can get font-size (number) too of my text. 回答1: Using tesserocr, you can get a ResultIterator after calling Recognize on your image, for which you can

How to use trained data with pytesseract?

阅读更多关于 How to use trained data with pytesseract?

问题 Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata Right now I'm using this simple script : try: import Image except ImportError: from PIL import Image import pytesseract as tes results = tes.image_to_string(Image.open('./test.jpg'),boxes=True) file = open('parsing.text','a') file.write(results) print(results) How to I use my traineddata file so I'm able to read new font with the python script

How to save dpi info in py-opencv?

阅读更多关于 How to save dpi info in py-opencv?

问题 import cv2 def clear(img): back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE) img = cv2.bitwise_xor(img, back) ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) return img def threshold(img): ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV) img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY) return img def fomatImage(img): img = threshold(img) img = clear(img) return img img = fomatImage(cv2.imread("1566135246468