How to remove the noise in the given image so that the ocr output be perfect?

后端未结

关注

 1  1561

I have done otsu thresholding on this bengali text image and use tesseract to OCR but the output is very bad. What preprocessing should I apply to remove the noise?

相关标签:

1条回答

甜味超标

2021-01-17 02:42

You can remove the noises by removing small connected components that might improve the accuracy. You would also need to get optimum value for noisy components threshold value.

import cv2 
import numpy as np

img = cv2.imread(r'D:\Image\st5.png',0)
ret, bw = cv2.threshold(img, 128,255,cv2.THRESH_BINARY_INV)

connectivity = 4
nb_components, output, stats, centroids = cv2.connectedComponentsWithStats(bw, connectivity, cv2.CV_32S)
sizes = stats[1:, -1]; nb_components = nb_components - 1
min_size = 50 #threshhold value for small noisy components
img2 = np.zeros((output.shape), np.uint8)

for i in range(0, nb_components):
    if sizes[i] >= min_size:
        img2[output == i + 1] = 255

res = cv2.bitwise_not(img2)

Denoised image:

0 讨论(0)