Cleaning up captcha image

前端 未结 3 1500
忘掉有多难
忘掉有多难 2021-02-01 19:12

I\'m trying to clean up the image above I\'ve tried several different methods using open cv, I either erode the original image too much to the point where parts of the

3条回答
  •  日久生厌
    2021-02-01 19:52

    Here is a C# solution using OpenCvSharp (which should be easy to convert back to python/c++ because the method names are exactly the same).

    It uses OpenCV's inpainting technique to avoid destroying too much of the letters before possibly running an OCR phase. We can see that the lines have a different color than the rest, so we'll use that information very early, before any grayscaling/blackwhiting. Steps are as follow:

    • build a mask from the lines using their color (#707070)
    • dilate that mask a bit because the lines may have been drawn with antialiasing
    • repaint ("inpaint") the original image using this mask, which will remove the lines while preserving most of what was below the lines (letters). Note we could remove the small points before that step, I think it would be even better
    • apply some dilate/blur/threshold to finalize

    Here is the mask:

    Here is the result:

    Here is the result on sample set:

    Here is the C# code:

    static void Decaptcha(string filePath)
    {
        // load the file
        using (var src = new Mat(filePath))
        {
            using (var binaryMask = new Mat())
            {
                // lines color is different than text
                var linesColor = Scalar.FromRgb(0x70, 0x70, 0x70);
    
                // build a mask of lines
                Cv2.InRange(src, linesColor, linesColor, binaryMask);
                using (var masked = new Mat())
                {
                    // build the corresponding image
                    // dilate lines a bit because aliasing may have filtered borders too much during masking
                    src.CopyTo(masked, binaryMask);
                    int linesDilate = 3;
                    using (var element = Cv2.GetStructuringElement(MorphShapes.Ellipse, new Size(linesDilate, linesDilate)))
                    {
                        Cv2.Dilate(masked, masked, element);
                    }
    
                    // convert mask to grayscale
                    Cv2.CvtColor(masked, masked, ColorConversionCodes.BGR2GRAY);
                    using (var dst = src.EmptyClone())
                    {
                        // repaint big lines
                        Cv2.Inpaint(src, masked, dst, 3, InpaintMethod.NS);
    
                        // destroy small lines
                        linesDilate = 2;
                        using (var element = Cv2.GetStructuringElement(MorphShapes.Ellipse, new Size(linesDilate, linesDilate)))
                        {
                            Cv2.Dilate(dst, dst, element);
                        }
    
                        Cv2.GaussianBlur(dst, dst, new Size(5, 5), 0);
                        using (var dst2 = dst.BilateralFilter(5, 75, 75))
                        {
                            // basically make it B&W
                            Cv2.CvtColor(dst2, dst2, ColorConversionCodes.BGR2GRAY);
                            Cv2.Threshold(dst2, dst2, 255, 255, ThresholdTypes.Otsu);
    
                            // save the file
                            dst2.SaveImage(Path.Combine(
                                Path.GetDirectoryName(filePath),
                                Path.GetFileNameWithoutExtension(filePath) + "_dst" + Path.GetExtension(filePath)));
                        }
                    }
                }
            }
        }
    }
    

提交回复
热议问题