The Tesseract OCR engine isn't able to read the text from an auto generated image, but can from a CUT in MS Paint

后端 未结 1 838
余生分开走
余生分开走 2021-01-21 13:45

I\'m using a .NET wrapper for the Tesseract OCR engine. I have a large document that is a PNG. When I cut out a section of image in MS paint and then feed it into the engine, it

相关标签:
1条回答
  • 2021-01-21 14:21

    The default resolution of a new Bitmap is 96 DPI, which is not adequate for OCR purpose. Try to increase to 300 DPI, such as:

    bmp.SetResolution(300, 300);

    Update 1: When you scale the image, its dimension should change as well. Here's an example rescale function:

    public static Image Rescale(Image image, int dpiX, int dpiY)
    {
        Bitmap bm = new Bitmap((int)(image.Width * dpiX / image.HorizontalResolution), (int)(image.Height * dpiY / image.VerticalResolution));
        bm.SetResolution(dpiX, dpiY);
        Graphics g = Graphics.FromImage(bm);
        g.InterpolationMode = InterpolationMode.Bicubic;
        g.PixelOffsetMode = PixelOffsetMode.HighQuality;
        g.DrawImage(image, 0, 0);
        g.Dispose();
    
        return bm;
    }
    
    0 讨论(0)
提交回复
热议问题