Google Vision hexadecimal numbers recognition

问题

Google Vision OCR recognizes for hexadecimal numbers with mistakes very often (the accuracy is about 60%). For example when I try to recognize a scanned image with muber "78 30 3D 61" the Google OCR recognizes it with text like "78 30 30 61". For OCR recognition I used the live demo and .NET Api client with the same incorrect result.

Here is my C# code:

var image = await Google.Cloud.Vision.V1.Image.FromFileAsync("c:\\path\\to\\file.png");
var imageContext = new ImageContext();
imageContext.LanguageHints.Add("en");
imageContext.LanguageHints.Add("iw");
var recognizedText = await imageAnnotatorClientBuilder.DetectDocumentTextAsync(image, imageContext);

The image maniulation which I've tried with no results:

Thresholding the image with the different levels
Color inverting for the image
Playing with contrast/brightness/sharpness

Is that have any possiblity to learn the google vision or specify that the image contains hexadecimal numbers (like ImageContext but for hexadecimal numbers)?

Also I've shared an example image to Google Drive with recognition mistakes so you can try it on the live google demo also.

回答1:

In the image provided, the only hexadecimal digits I see are the ones labeled as Block 6 in Cloud Vision API [1]. The hexadecimal system uses 16 symbols (0-9,A-F) which may lead to a mislabelling of the A-F symbols when surrounded by the numeric symbols. A possible explanation why Vision API is mislabelling is because it probably uses convolutional neural networks and the context is taken into account. As it occurs in this case, the ‘D’ may be recognized as a ‘0’ because it is surrounded by numbers and Vision API does not expect it to be a letter.

Vision API uses already trained models and it cannot be changed. In case you are only interested in the hexadecimal number I referred above, I would suggest that you crop the image and look for a model specifically designed for recognizing hexadecimal numbers.

AutoML [2] allows you to train your custom machine learning models. Take a look at the sight section to see the AutoML Vision documentation [3]. Using this service, you will be able to train specific models that match your requirements.

[1] - https://cloud.google.com/vision/docs/drag-and-drop

[2] - https://cloud.google.com/automl

[3] - https://cloud.google.com/vision/overview/docs#automl-vision

来源：https://stackoverflow.com/questions/65199724/google-vision-hexadecimal-numbers-recognition

标签

google-cloud-vision