Using YOLO or other image recognition techniques to identify all alphanumeric text present in images

前端 未结 3 2048
忘掉有多难
忘掉有多难 2021-02-04 09:43

I have multiple images diagram, all of which contains labels as alphanumeric characters instead of just the text label itself. I want my YOLO model to identify all the numbers &

3条回答
  •  情歌与酒
    2021-02-04 10:15

    A possible approach is to use the EAST (Efficient and Accurate Scene Text) deep learning text detector based on Zhou et al.’s 2017 paper, EAST: An Efficient and Accurate Scene Text Detector. The model was originally trained for detecting text in natural scene images but it may be possible to apply it on diagram images. EAST is quite robust and is capable of detecting blurred or reflective text. Here is a modified version of Adrian Rosebrock's implementation of EAST. Instead of applying the text detector directly on the image, we can try to remove as much non-text objects on the image before performing text detection. The idea is to remove horizontal lines, vertical lines, and non-text contours (curves, diagonals, circular shapes) before applying detection. Here's the results with some of your images:

    Input -> Non-text contours to remove in green

    Result

    Other images

    The pretrained frozen_east_text_detection.pb model necessary to perform text detection can be found here. Although the model catches most of the text, the results are not 100% accurate and has occasional false positives probably due to how it was trained on natural scene images. To obtain more accurate results you would probably have to train your own custom model. But if you want a decent out-of-the-box solution then this should work you. Check out Adrian's OpenCV Text Detection (EAST text detector) blog post for a more comprehensive explanation of the EAST text detector.

    Code

    from imutils.object_detection import non_max_suppression
    import numpy as np
    import cv2
    
    def EAST_text_detector(original, image, confidence=0.25):
        # Set the new width and height and determine the changed ratio
        (h, W) = image.shape[:2]
        (newW, newH) = (640, 640)
        rW = W / float(newW)
        rH = h / float(newH)
    
        # Resize the image and grab the new image dimensions
        image = cv2.resize(image, (newW, newH))
        (h, W) = image.shape[:2]
    
        # Define the two output layer names for the EAST detector model that
        # we are interested -- the first is the output probabilities and the
        # second can be used to derive the bounding box coordinates of text
        layerNames = [
            "feature_fusion/Conv_7/Sigmoid",
            "feature_fusion/concat_3"]
    
        net = cv2.dnn.readNet('frozen_east_text_detection.pb')
    
        # Construct a blob from the image and then perform a forward pass of
        # the model to obtain the two output layer sets
        blob = cv2.dnn.blobFromImage(image, 1.0, (W, h), (123.68, 116.78, 103.94), swapRB=True, crop=False)
        net.setInput(blob)
        (scores, geometry) = net.forward(layerNames)
    
        # Grab the number of rows and columns from the scores volume, then
        # initialize our set of bounding box rectangles and corresponding
        # confidence scores
        (numRows, numCols) = scores.shape[2:4]
        rects = []
        confidences = []
    
        # Loop over the number of rows
        for y in range(0, numRows):
            # Extract the scores (probabilities), followed by the geometrical
            # data used to derive potential bounding box coordinates that
            # surround text
            scoresData = scores[0, 0, y]
            xData0 = geometry[0, 0, y]
            xData1 = geometry[0, 1, y]
            xData2 = geometry[0, 2, y]
            xData3 = geometry[0, 3, y]
            anglesData = geometry[0, 4, y]
    
            # Loop over the number of columns
            for x in range(0, numCols):
                # If our score does not have sufficient probability, ignore it
                if scoresData[x] < confidence:
                    continue
    
                # Compute the offset factor as our resulting feature maps will
                # be 4x smaller than the input image
                (offsetX, offsetY) = (x * 4.0, y * 4.0)
    
                # Extract the rotation angle for the prediction and then
                # compute the sin and cosine
                angle = anglesData[x]
                cos = np.cos(angle)
                sin = np.sin(angle)
    
                # Use the geometry volume to derive the width and height of
                # the bounding box
                h = xData0[x] + xData2[x]
                w = xData1[x] + xData3[x]
    
                # Compute both the starting and ending (x, y)-coordinates for
                # the text prediction bounding box
                endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
                endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
                startX = int(endX - w)
                startY = int(endY - h)
    
                # Add the bounding box coordinates and probability score to
                # our respective lists
                rects.append((startX, startY, endX, endY))
                confidences.append(scoresData[x])
    
        # Apply non-maxima suppression to suppress weak, overlapping bounding
        # boxes
        boxes = non_max_suppression(np.array(rects), probs=confidences)
    
        # Loop over the bounding boxes
        for (startX, startY, endX, endY) in boxes:
            # Scale the bounding box coordinates based on the respective
            # ratios
            startX = int(startX * rW)
            startY = int(startY * rH)
            endX = int(endX * rW)
            endY = int(endY * rH)
    
            # Draw the bounding box on the image
            cv2.rectangle(original, (startX, startY), (endX, endY), (36, 255, 12), 2)
        return original
    
    # Convert to grayscale and Otsu's threshold
    image = cv2.imread('1.png')
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    clean = thresh.copy()
    
    # Remove horizontal lines
    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
    detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
    cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        cv2.drawContours(clean, [c], -1, 0, 3)
    
    # Remove vertical lines
    vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
    detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
    cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        cv2.drawContours(clean, [c], -1, 0, 3)
    
    # Remove non-text contours (curves, diagonals, circlar shapes)
    cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        area = cv2.contourArea(c)
        if area > 1500:
            cv2.drawContours(clean, [c], -1, 0, -1)
        peri = cv2.arcLength(c, True)
        approx = cv2.approxPolyDP(c, 0.02 * peri, True)
        x,y,w,h = cv2.boundingRect(c)
        if len(approx) == 4:
            cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)
    
    # Bitwise-and with original image to remove contours
    filtered = cv2.bitwise_and(image, image, mask=clean)
    filtered[clean==0] = (255,255,255)
    
    # Perform EAST text detection
    result = EAST_text_detector(image, filtered)
    
    cv2.imshow('filtered', filtered)
    cv2.imshow('result', result)
    cv2.waitKey()
    

提交回复
热议问题