iOS Tesseract OCR Image Preperation

后端 未结 2 492
一生所求
一生所求 2020-12-07 16:21

I would like to implement an OCR application that would recognize text from Photos.

I succeeded in Compiling and Integration the Tesseract Engine in iOS, I succeeded

相关标签:
2条回答
  • 2020-12-07 16:29

    I have used the code above but added two other function calls as well to convert the image so that it will work with the Tesseract.

    Firstly I used an image resize script to convert to 640 x 640 which seems to be more manageable for the Tesseract.

    -(UIImage *)resizeImage:(UIImage *)image {
    
        CGImageRef imageRef = [image CGImage];
        CGImageAlphaInfo alphaInfo = CGImageGetAlphaInfo(imageRef);
        CGColorSpaceRef colorSpaceInfo = CGColorSpaceCreateDeviceRGB();
    
        if (alphaInfo == kCGImageAlphaNone)
            alphaInfo = kCGImageAlphaNoneSkipLast;
    
        int width, height;
    
        width = 640;//[image size].width;
        height = 640;//[image size].height;
    
        CGContextRef bitmap;
    
        if (image.imageOrientation == UIImageOrientationUp | image.imageOrientation == UIImageOrientationDown) {
            bitmap = CGBitmapContextCreate(NULL, width, height, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);
    
        } else {
            bitmap = CGBitmapContextCreate(NULL, height, width, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);
    
        }
    
        if (image.imageOrientation == UIImageOrientationLeft) {
            NSLog(@"image orientation left");
            CGContextRotateCTM (bitmap, radians(90));
            CGContextTranslateCTM (bitmap, 0, -height);
    
        } else if (image.imageOrientation == UIImageOrientationRight) {
            NSLog(@"image orientation right");
            CGContextRotateCTM (bitmap, radians(-90));
            CGContextTranslateCTM (bitmap, -width, 0);
    
        } else if (image.imageOrientation == UIImageOrientationUp) {
            NSLog(@"image orientation up");
    
        } else if (image.imageOrientation == UIImageOrientationDown) {
            NSLog(@"image orientation down");
            CGContextTranslateCTM (bitmap, width,height);
            CGContextRotateCTM (bitmap, radians(-180.));
    
        }
    
        CGContextDrawImage(bitmap, CGRectMake(0, 0, width, height), imageRef);
        CGImageRef ref = CGBitmapContextCreateImage(bitmap);
        UIImage *result = [UIImage imageWithCGImage:ref];
    
        CGContextRelease(bitmap);
        CGImageRelease(ref);
    
        return result;
    }
    

    So that the radians work ensure you declare it above the @implementation

    static inline double radians (double degrees) {return degrees * M_PI/180;}
    

    Then I convert to grayscale.

    I found this article Convert image to grayscale on converting to grayscale.

    I have used the code from here successfully and can now read different colour text and different colour backgrounds

    I have modified the code slightly to work as a function within a class rather than as its own class which the other person did

    - (UIImage *) toGrayscale:(UIImage*)img
    {
        const int RED = 1;
        const int GREEN = 2;
        const int BLUE = 3;
    
        // Create image rectangle with current image width/height
        CGRect imageRect = CGRectMake(0, 0, img.size.width * img.scale, img.size.height * img.scale);
    
        int width = imageRect.size.width;
        int height = imageRect.size.height;
    
        // the pixels will be painted to this array
        uint32_t *pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
    
        // clear the pixels so any transparency is preserved
        memset(pixels, 0, width * height * sizeof(uint32_t));
    
        CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    
        // create a context with RGBA pixels
        CGContextRef context = CGBitmapContextCreate(pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                     kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);
    
        // paint the bitmap to our context which will fill in the pixels array
        CGContextDrawImage(context, CGRectMake(0, 0, width, height), [img CGImage]);
    
        for(int y = 0; y < height; y++) {
            for(int x = 0; x < width; x++) {
                uint8_t *rgbaPixel = (uint8_t *) &pixels[y * width + x];
    
                // convert to grayscale using recommended method:     http://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale
                uint32_t gray = 0.3 * rgbaPixel[RED] + 0.59 * rgbaPixel[GREEN] + 0.11 * rgbaPixel[BLUE];
    
                // set the pixels to gray
                rgbaPixel[RED] = gray;
                rgbaPixel[GREEN] = gray;
                rgbaPixel[BLUE] = gray;
            }
        }
    
        // create a new CGImageRef from our context with the modified pixels
        CGImageRef image = CGBitmapContextCreateImage(context);
    
        // we're done with the context, color space, and pixels
        CGContextRelease(context);
        CGColorSpaceRelease(colorSpace);
        free(pixels);
    
        // make a new UIImage to return
        UIImage *resultUIImage = [UIImage imageWithCGImage:image
                                                 scale:img.scale
                                           orientation:UIImageOrientationUp];
    
        // we're done with image now too
        CGImageRelease(image);
    
        return resultUIImage;
    }
    
    0 讨论(0)
  • 2020-12-07 16:47

    I'm currently working on the same thing. I found that a PNG saved in photoshop worked fine, but an image which was originally sourced from the camera then imported into the app never worked. Don't ask me to explain it - but applying this function made these images work. Maybe it'll work for you too.

    // this does the trick to have tesseract accept the UIImage.
    UIImage * gs_convert_image (UIImage * src_img) {
        CGColorSpaceRef d_colorSpace = CGColorSpaceCreateDeviceRGB();
        /*
         * Note we specify 4 bytes per pixel here even though we ignore the
         * alpha value; you can't specify 3 bytes per-pixel.
         */
        size_t d_bytesPerRow = src_img.size.width * 4;
        unsigned char * imgData = (unsigned char*)malloc(src_img.size.height*d_bytesPerRow);
        CGContextRef context =  CGBitmapContextCreate(imgData, src_img.size.width,
                                                      src_img.size.height,
                                                      8, d_bytesPerRow,
                                                      d_colorSpace,
                                                      kCGImageAlphaNoneSkipFirst);
    
        UIGraphicsPushContext(context);
        // These next two lines 'flip' the drawing so it doesn't appear upside-down.
        CGContextTranslateCTM(context, 0.0, src_img.size.height);
        CGContextScaleCTM(context, 1.0, -1.0);
        // Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll have issues when the source image is in portrait orientation.
        [src_img drawInRect:CGRectMake(0.0, 0.0, src_img.size.width, src_img.size.height)];
        UIGraphicsPopContext();
    
        /*
         * At this point, we have the raw ARGB pixel data in the imgData buffer, so
         * we can perform whatever image processing here.
         */
    
    
        // After we've processed the raw data, turn it back into a UIImage instance.
        CGImageRef new_img = CGBitmapContextCreateImage(context);
        UIImage * convertedImage = [[UIImage alloc] initWithCGImage:
                                     new_img];
    
        CGImageRelease(new_img);
        CGContextRelease(context);
        CGColorSpaceRelease(d_colorSpace);
        free(imgData);
        return convertedImage;
    }
    

    I've also gone a lot of experimentation preparing the image for tesseract. Resizing, converting to grayscale, then adjusting brightness and contrast seems to work best.

    I've also tried this GPUImage library. https://github.com/BradLarson/GPUImage And the GPUImageAverageLuminanceThresholdFilter seems to give me a great adjusted image, but tesseract doesn't seem to work well with it.

    I've also put in opencv into my project and plan to try out it's image routines. Possibly even some box detection to find the text area (i'm hoping this will speed up tesseract).

    0 讨论(0)
提交回复
热议问题