python-tesseract

converting pdf to image but after zooming in

老子叫甜甜 提交于 2020-05-14 20:48:07
问题 This link shows how pdf s could be converted to images. Is there a way to zoom my pdf s before converting to images? In my project, i am converting pdf s to png s and then using Python-tesseract library to extract text. I noticed that if I zoom pdf s and then save parts as png s then OCR provides much better results. So is there a way to zoom pdfs before converting to pngs? 回答1: I think that raising the quality (resolution) of your image is a better solution than zooming into the pdf. using

WinError 5:Access denied PyTesseract

ぃ、小莉子 提交于 2020-05-14 17:46:25
问题 I know this question has already been answered on this site, however, none of the solutions I looke up the internet seemed to work. Here's what I tried: Giving all permissions to my python file Changing PATH variable to point to my tesseract folder Running IDLE as administrator and then executing the file from there This error is quite bothering me now and I can't advance any further because of it. Here's my code if that's going to help: import pytesseract import sys import argparse try:

pytesseract fail to recognise digits from image

余生长醉 提交于 2020-05-11 04:36:45
问题 I've this python code which i use to convert a text written in a picture to a string, it does work for certain images whom have large characters, but not for the one i'm trying right now which contains only digits. There is my code: from PIL import Image img = Image.open('img.png') pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract' result = pytesseract.image_to_string(img) print (result) Why is it failing recognising this specific image and how can i

pytesseract Failed loading language \'eng\'

馋奶兔 提交于 2020-04-17 19:11:29
问题 I've seen a lot of other people getting this error, and I've tried a lot of different things to fix it. Nothing so far has worked. I have: Added the path to my Tesseract-OCR folder AND the tesseract.exe file to PATH Added an environment variable called TESSDATA_PREFIX which leads to the Tesseract-OCR folder Replaced the eng.traneddata file a couple times Added pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" to the program Tried running JUST the

(-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'

穿精又带淫゛_ 提交于 2020-04-06 08:49:15
问题 I am trying to recognize text from an image to then have the text outputted; however, this error spits out: Traceback (most recent call last): File "C:/Users/Benji's Beast/AppData/Local/Programs/Python/Python37-32/imageDet.py", line 41, in print(get_string(src_path + "cont.jpg") ) File "C:/Users/Benji's Beast/AppData/Local/Programs/Python/Python37-32/imageDet.py", line 15, in get_string img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.error: OpenCV(3.4.4) C:\projects\opencv-python\opencv

Extracting selected text by bounding box from an image

不羁岁月 提交于 2020-02-29 03:55:08
问题 I am trying to fetch selected text by bounding box on an Image. like if only on word is selected by bounding box and I want to fetch that text and convert it into the text file. Please see my code and give some review so I can implement that functionality. So far what I've done I've converted the PDF file to image with bounding box over the text. import numpy as np import csv import io from PIL import Image import pytesseract from wand.image import Image as wi from pytesseract import Output

TesseractNotFoundError: tesseract is not installed or it's not in your path

こ雲淡風輕ζ 提交于 2020-02-28 18:42:05
问题 I am trying to use tesseract-OCR to print text from the image. But I am getting the above error. I have installed tesseract OCR using https://github.com/UB-Mannheim/tesseract/wiki and pytesseract in the anaconda prompt using pip install pytesseract but its not working. Please help if anyone has faced the similar issue. (base) C:\Users\500066016>pip install pytesseract Collecting pytesseract Downloading https://files.pythonhosted.org/packages/13/56

How improve image quality to extract text from image using Tesseract

前提是你 提交于 2020-02-25 06:37:45
问题 I'm trying to use Tessract in the code below to extract the two lines of the image. I tryied to improve the image quality but even though it didn't work. Can anyone help me? from PIL import Image, ImageEnhance, ImageFilter import pytesseract img = Image.open(r'C:\ocr\test00.jpg') new_size = tuple(4*x for x in img.size) img = img.resize(new_size, Image.ANTIALIAS) img.save(r'C:\\test02.jpg', 'JPEG') print( pytesseract.image_to_string( img ) ) 回答1: Given the comment by @barny I don't know if

How improve image quality to extract text from image using Tesseract

天大地大妈咪最大 提交于 2020-02-25 06:37:16
问题 I'm trying to use Tessract in the code below to extract the two lines of the image. I tryied to improve the image quality but even though it didn't work. Can anyone help me? from PIL import Image, ImageEnhance, ImageFilter import pytesseract img = Image.open(r'C:\ocr\test00.jpg') new_size = tuple(4*x for x in img.size) img = img.resize(new_size, Image.ANTIALIAS) img.save(r'C:\\test02.jpg', 'JPEG') print( pytesseract.image_to_string( img ) ) 回答1: Given the comment by @barny I don't know if

Background image cleaning for OCR

和自甴很熟 提交于 2020-02-12 01:55:52
问题 Through tesseract-OCR I am trying to extract text from the following images with a red background. I have problems extracting the text in boxes B and D because there are vertical lines. How can I clean the background like this: input: output: some idea? The image without boxes: 回答1: Here are two methods to clean the image using Python OpenCV Method #1: Numpy thresholding Since the vertical lines, horizontal lines, and the background are in red we can take advantage of this and use Numpy