How to extract text or numbers from images using python

前端 未结 1 1390
难免孤独
难免孤独 2020-12-04 00:01

I want to extract text (mainly numbers) from images like this

I tried this code

import pytesseract
from PIL import Image

pytesseract.pytes         


        
相关标签:
1条回答
  • 2020-12-04 01:07

    When performing OCR, it is important to preprocess the image so the desired text to detect is in black with the background in white. To do this, here's a simple approach using OpenCV to Otsu's threshold the image which will result in a binary image. Here's the image after preprocessing:

    We use the --psm 6 configuration setting to treat the image as a uniform block of text. Here's other configuration options you can try. Result from Pytesseract

    01153521976

    Code

    import cv2
    import pytesseract
    
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    
    image = cv2.imread('1.png', 0)
    thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    
    data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
    print(data)
    
    cv2.imshow('thresh', thresh)
    cv2.waitKey()
    
    0 讨论(0)
提交回复
热议问题