Using YOLO or other image recognition techniques to identify all alphanumeric text present in images

前端 未结 3 2049
忘掉有多难
忘掉有多难 2021-02-04 09:43

I have multiple images diagram, all of which contains labels as alphanumeric characters instead of just the text label itself. I want my YOLO model to identify all the numbers &

相关标签:
3条回答
  • 2021-02-04 10:06

    For convenience sake I'd like to add the package keras_ocr. It can easily be installed with pip, and is based on the CRAFT text detector, which is a bit newer than the EAST detector if I'm not wrong.

    Next to the detection it already does some OCR too! The results are as seen below, see this as an alternative, maybe easier to implement, than the accepted answer.

    0 讨论(0)
  • 2021-02-04 10:15

    A possible approach is to use the EAST (Efficient and Accurate Scene Text) deep learning text detector based on Zhou et al.’s 2017 paper, EAST: An Efficient and Accurate Scene Text Detector. The model was originally trained for detecting text in natural scene images but it may be possible to apply it on diagram images. EAST is quite robust and is capable of detecting blurred or reflective text. Here is a modified version of Adrian Rosebrock's implementation of EAST. Instead of applying the text detector directly on the image, we can try to remove as much non-text objects on the image before performing text detection. The idea is to remove horizontal lines, vertical lines, and non-text contours (curves, diagonals, circular shapes) before applying detection. Here's the results with some of your images:

    Input -> Non-text contours to remove in green

    Result

    Other images

    The pretrained frozen_east_text_detection.pb model necessary to perform text detection can be found here. Although the model catches most of the text, the results are not 100% accurate and has occasional false positives probably due to how it was trained on natural scene images. To obtain more accurate results you would probably have to train your own custom model. But if you want a decent out-of-the-box solution then this should work you. Check out Adrian's OpenCV Text Detection (EAST text detector) blog post for a more comprehensive explanation of the EAST text detector.

    Code

    from imutils.object_detection import non_max_suppression
    import numpy as np
    import cv2
    
    def EAST_text_detector(original, image, confidence=0.25):
        # Set the new width and height and determine the changed ratio
        (h, W) = image.shape[:2]
        (newW, newH) = (640, 640)
        rW = W / float(newW)
        rH = h / float(newH)
    
        # Resize the image and grab the new image dimensions
        image = cv2.resize(image, (newW, newH))
        (h, W) = image.shape[:2]
    
        # Define the two output layer names for the EAST detector model that
        # we are interested -- the first is the output probabilities and the
        # second can be used to derive the bounding box coordinates of text
        layerNames = [
            "feature_fusion/Conv_7/Sigmoid",
            "feature_fusion/concat_3"]
    
        net = cv2.dnn.readNet('frozen_east_text_detection.pb')
    
        # Construct a blob from the image and then perform a forward pass of
        # the model to obtain the two output layer sets
        blob = cv2.dnn.blobFromImage(image, 1.0, (W, h), (123.68, 116.78, 103.94), swapRB=True, crop=False)
        net.setInput(blob)
        (scores, geometry) = net.forward(layerNames)
    
        # Grab the number of rows and columns from the scores volume, then
        # initialize our set of bounding box rectangles and corresponding
        # confidence scores
        (numRows, numCols) = scores.shape[2:4]
        rects = []
        confidences = []
    
        # Loop over the number of rows
        for y in range(0, numRows):
            # Extract the scores (probabilities), followed by the geometrical
            # data used to derive potential bounding box coordinates that
            # surround text
            scoresData = scores[0, 0, y]
            xData0 = geometry[0, 0, y]
            xData1 = geometry[0, 1, y]
            xData2 = geometry[0, 2, y]
            xData3 = geometry[0, 3, y]
            anglesData = geometry[0, 4, y]
    
            # Loop over the number of columns
            for x in range(0, numCols):
                # If our score does not have sufficient probability, ignore it
                if scoresData[x] < confidence:
                    continue
    
                # Compute the offset factor as our resulting feature maps will
                # be 4x smaller than the input image
                (offsetX, offsetY) = (x * 4.0, y * 4.0)
    
                # Extract the rotation angle for the prediction and then
                # compute the sin and cosine
                angle = anglesData[x]
                cos = np.cos(angle)
                sin = np.sin(angle)
    
                # Use the geometry volume to derive the width and height of
                # the bounding box
                h = xData0[x] + xData2[x]
                w = xData1[x] + xData3[x]
    
                # Compute both the starting and ending (x, y)-coordinates for
                # the text prediction bounding box
                endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
                endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
                startX = int(endX - w)
                startY = int(endY - h)
    
                # Add the bounding box coordinates and probability score to
                # our respective lists
                rects.append((startX, startY, endX, endY))
                confidences.append(scoresData[x])
    
        # Apply non-maxima suppression to suppress weak, overlapping bounding
        # boxes
        boxes = non_max_suppression(np.array(rects), probs=confidences)
    
        # Loop over the bounding boxes
        for (startX, startY, endX, endY) in boxes:
            # Scale the bounding box coordinates based on the respective
            # ratios
            startX = int(startX * rW)
            startY = int(startY * rH)
            endX = int(endX * rW)
            endY = int(endY * rH)
    
            # Draw the bounding box on the image
            cv2.rectangle(original, (startX, startY), (endX, endY), (36, 255, 12), 2)
        return original
    
    # Convert to grayscale and Otsu's threshold
    image = cv2.imread('1.png')
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    clean = thresh.copy()
    
    # Remove horizontal lines
    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
    detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
    cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        cv2.drawContours(clean, [c], -1, 0, 3)
    
    # Remove vertical lines
    vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,30))
    detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
    cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        cv2.drawContours(clean, [c], -1, 0, 3)
    
    # Remove non-text contours (curves, diagonals, circlar shapes)
    cnts = cv2.findContours(clean, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    for c in cnts:
        area = cv2.contourArea(c)
        if area > 1500:
            cv2.drawContours(clean, [c], -1, 0, -1)
        peri = cv2.arcLength(c, True)
        approx = cv2.approxPolyDP(c, 0.02 * peri, True)
        x,y,w,h = cv2.boundingRect(c)
        if len(approx) == 4:
            cv2.rectangle(clean, (x, y), (x + w, y + h), 0, -1)
    
    # Bitwise-and with original image to remove contours
    filtered = cv2.bitwise_and(image, image, mask=clean)
    filtered[clean==0] = (255,255,255)
    
    # Perform EAST text detection
    result = EAST_text_detector(image, filtered)
    
    cv2.imshow('filtered', filtered)
    cv2.imshow('result', result)
    cv2.waitKey()
    
    0 讨论(0)
  • 2021-02-04 10:16

    What you're describing appears to be OCR (Optical character recognition). One OCR engine I know of is tesseract, although there is also this one from IBM and others.

    As YOLO was originally trained for a very different task, to use it for localizing text will likely require to retrain it from scratch. One could try to use existing packages (adapted to your specific setting) for ground truth (although it is worth to remember that the model would generally be only at most as good as the ground truth). Or, perhaps more easily, generate synthetic data for training (i.e. add text in positions you choose to existing drawings then train to localize it).

    Alternatively, if all of your target images are structured similar to the above, one could try to create ground truth using classic CV heuristics as you did above to separate/segment out symbols, followed by classification using a CNN trained on MNIST or similar to determine if a given blob contains a symbol.

    For the case you do opt for YOLO - there are existing implementations in python, e.g. I had some experience with this one - should be fairly straightforward to set up training with your own ground truth.

    Finally, if using YOLO or CNN is not a goal in itself but rather only the solution, any of the above "ground truth" could be used directly as a solution, and not for training a model.

    Hope I understood your question correctly

    0 讨论(0)
提交回复
热议问题