python-tesseract

Bad character recognition with Pytesseract OCR for images with table structure

痞子三分冷 提交于 2020-08-25 04:16:38
问题 I use a code to locate text boxes and create a rectangle around them. This allows me to rebuild the grid around the table structure in the image. However, even if the text box detection works very well, if I try to define the characters present in each rectangle, pytesseract does not identify them well and does not allow to find the original text. Here is my Python code : import os import cv2 import imutils import argparse import numpy as np import pytesseract # This only works if there's

Bad character recognition with Pytesseract OCR for images with table structure

冷暖自知 提交于 2020-08-25 04:12:48
问题 I use a code to locate text boxes and create a rectangle around them. This allows me to rebuild the grid around the table structure in the image. However, even if the text box detection works very well, if I try to define the characters present in each rectangle, pytesseract does not identify them well and does not allow to find the original text. Here is my Python code : import os import cv2 import imutils import argparse import numpy as np import pytesseract # This only works if there's