How to remove the noise in the given image so that the ocr output be perfect?

后端 未结 1 1556
走了就别回头了
走了就别回头了 2021-01-17 02:20

I have done otsu thresholding on this bengali text image and use tesseract to OCR but the output is very bad. What preprocessing should I apply to remove the noise?

相关标签:
1条回答
  • 2021-01-17 02:42

    You can remove the noises by removing small connected components that might improve the accuracy. You would also need to get optimum value for noisy components threshold value.

    import cv2 
    import numpy as np
    
    img = cv2.imread(r'D:\Image\st5.png',0)
    ret, bw = cv2.threshold(img, 128,255,cv2.THRESH_BINARY_INV)
    
    connectivity = 4
    nb_components, output, stats, centroids = cv2.connectedComponentsWithStats(bw, connectivity, cv2.CV_32S)
    sizes = stats[1:, -1]; nb_components = nb_components - 1
    min_size = 50 #threshhold value for small noisy components
    img2 = np.zeros((output.shape), np.uint8)
    
    for i in range(0, nb_components):
        if sizes[i] >= min_size:
            img2[output == i + 1] = 255
    
    res = cv2.bitwise_not(img2)
    

    Denoised image:

    0 讨论(0)
提交回复
热议问题