OCR: Image to text?

风格不统一 提交于 2019-12-02 14:19:35
The iOSDev

Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:

  1. I need to get the only one cropped image block with number plate contained in it.
  2. From that plate need to find out the portion of the number portion using the data I got using the method provided here.
  3. Then converting the image data to almost black and white using the RGB data found through the above method.
  4. Then the data is converted to the Image using the method provided here.

Above 4 steps are combined in to one method like this as below :

-(void)getRGBAsFromImage:(UIImage*)image
{
    NSInteger count = (image.size.width * image.size.height);
    // First get the image into your data buffer
    CGImageRef imageRef = [image CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);

    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);

    // Now your rawData contains the image data in the RGBA8888 pixel format.
    int byteIndex = 0;
    for (int ii = 0 ; ii < count ; ++ii)
    {
        CGFloat red   = (rawData[byteIndex]     * 1.0) ;
        CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
        CGFloat blue  = (rawData[byteIndex + 2] * 1.0) ;
        CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;

        NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
        if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
        {
            red = 255.0;
            green = 255.0;
            blue = 255.0;
            alpha = 255.0;
            // all value set to 255 to get white background.
        }
        rawData[byteIndex] = red;
        rawData[byteIndex + 1] = green;
        rawData[byteIndex + 2] = blue;
        rawData[byteIndex + 3] = alpha;

        byteIndex += 4;
    }

    colorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef bitmapContext = CGBitmapContextCreate(
                                                       rawData,
                                                       width,
                                                       height,
                                                       8, // bitsPerComponent
                                                       4*width, // bytesPerRow
                                                       colorSpace,
                                                       kCGImageAlphaNoneSkipLast);

    CFRelease(colorSpace);

    CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);

    UIImage *img = [UIImage imageWithCGImage:cgImage];

    //use the img for further use of ocr

    free(rawData);
}

Note:

The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.

UPDATE :

    CGImageRef imageRef = [plate CGImage];
    CIContext *context = [CIContext contextWithOptions:nil]; // 1
    CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
    CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
    CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
    CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
    UIImage *img = [UIImage imageWithCGImage:cgImage]; 

Just replace the above method's(getRGBAsFromImage:) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.

I was able to achieve near instant results using the demo photo provided as well as it generating the correct letters.

I pre-processed the image using GPUImage

// Pre-processing for OCR
GPUImageLuminanceThresholdFilter * adaptiveThreshold = [[GPUImageLuminanceThresholdFilter alloc] init];
[adaptiveThreshold setThreshold:0.3f];
[self setProcessedImage:[adaptiveThreshold imageByFilteringImage:_image]];

And then sending that processed image to TESS

- (NSArray *)processOcrAt:(UIImage *)image {
    [self setTesseractImage:image];

    _tesseract->Recognize(NULL);
    char* utf8Text = _tesseract->GetUTF8Text();

    return [self ocrProcessingFinished:[NSString stringWithUTF8String:utf8Text]];
}

- (NSArray *)ocrProcessingFinished:(NSString *)result {
    // Strip extra characters, whitespace/newlines
    NSString * results_noNewLine = [result stringByReplacingOccurrencesOfString:@"\n" withString:@""];
    NSArray * results_noWhitespace = [results_noNewLine componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
    NSString * results_final = [results_noWhitespace componentsJoinedByString:@""];
    results_final = [results_final lowercaseString];

    // Separate out individual letters
    NSMutableArray * letters = [[NSMutableArray alloc] initWithCapacity:results_final.length];
    for (int i = 0; i < [results_final length]; i++) {
        NSString * newTile = [results_final substringWithRange:NSMakeRange(i, 1)];
        [letters addObject:newTile];
    }

    return [NSArray arrayWithArray:letters];
}

- (void)setTesseractImage:(UIImage *)image {
    free(_pixels);

    CGSize size = [image size];
    int width = size.width;
    int height = size.height;

    if (width <= 0 || height <= 0)
        return;

    // the pixels will be painted to this array
    _pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
    // clear the pixels so any transparency is preserved
    memset(_pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);

    _tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
}

This left ' marks for the - but these are also easy to remove. Depending on the image set that you have you may have to fine tune it a bit but it should get you moving in the right direction.

Let me know if you have problems using it, it's from a project I'm using and I didn't want to have to strip everything out or create a project from scratch for it.

I daresay that tesseract will be overkill for your purpose. You do not need dictionary matching to improve recognition quality ( you do not have this dictionary , but maybe means to compute checksum on license number ), and you have font optimised for OCR. And best of all, you have markers (orange and blue color areas nearby are good) to find region in the image.

I my OCR apps I use human assisted area of interest retrieval ( just aiming help overlay over camera preview). Usually ones uses something like haar cascade to locate interesting features like faces. You may also calculate centroid of orange area, or just bounding box of orange pixels simply by traversing all the image and stoing leftmost / rightmost / topmost / bottommost pixels of suitable color

As for recognition itselff I would recommend to use invariant moments ( not sure whether implemented in tesseract, but you can easily port it from out java project: http://sourceforge.net/projects/javaocr/ )

I tried my demo app on monitor image and it recognized digits on the sport (is not trained for characters)

As for binarisation ( separating black from white ) I would recommend sauvola method as this gives best tolerance to luminance changes ( also implemented in our OCR project )

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!