Google Vision hexadecimal numbers recognition

自作多情 提交于 2021-01-29 15:20:26

问题


Google Vision OCR recognizes for hexadecimal numbers with mistakes very often (the accuracy is about 60%). For example when I try to recognize a scanned image with muber "78 30 3D 61" the Google OCR recognizes it with text like "78 30 30 61". For OCR recognition I used the live demo and .NET Api client with the same incorrect result.

Here is my C# code:

var image = await Google.Cloud.Vision.V1.Image.FromFileAsync("c:\\path\\to\\file.png");
var imageContext = new ImageContext();
imageContext.LanguageHints.Add("en");
imageContext.LanguageHints.Add("iw");
var recognizedText = await imageAnnotatorClientBuilder.DetectDocumentTextAsync(image, imageContext);

The image maniulation which I've tried with no results:

  • Thresholding the image with the different levels
  • Color inverting for the image
  • Playing with contrast/brightness/sharpness

Is that have any possiblity to learn the google vision or specify that the image contains hexadecimal numbers (like ImageContext but for hexadecimal numbers)?

Also I've shared an example image to Google Drive with recognition mistakes so you can try it on the live google demo also.


回答1:


In the image provided, the only hexadecimal digits I see are the ones labeled as Block 6 in Cloud Vision API [1]. The hexadecimal system uses 16 symbols (0-9,A-F) which may lead to a mislabelling of the A-F symbols when surrounded by the numeric symbols. A possible explanation why Vision API is mislabelling is because it probably uses convolutional neural networks and the context is taken into account. As it occurs in this case, the ‘D’ may be recognized as a ‘0’ because it is surrounded by numbers and Vision API does not expect it to be a letter.

Vision API uses already trained models and it cannot be changed. In case you are only interested in the hexadecimal number I referred above, I would suggest that you crop the image and look for a model specifically designed for recognizing hexadecimal numbers.

AutoML [2] allows you to train your custom machine learning models. Take a look at the sight section to see the AutoML Vision documentation [3]. Using this service, you will be able to train specific models that match your requirements.

[1] - https://cloud.google.com/vision/docs/drag-and-drop

[2] - https://cloud.google.com/automl

[3] - https://cloud.google.com/vision/overview/docs#automl-vision



来源:https://stackoverflow.com/questions/65199724/google-vision-hexadecimal-numbers-recognition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!