Recognize Micr font using OCR Engine?

前端 未结 2 1763
伪装坚强ぢ
伪装坚强ぢ 2021-01-22 18:46

I am using Microsoft OCR Library for reading text.

The Microsoft OCR library works perfectly. However i want to read the following list of characters given in the link h

相关标签:
2条回答
  • 2021-01-22 18:55

    [Microsoft OCR crew here] We don't yet support training OCR to customize it for your use-cases. However, we do actively keep an eye on stackoverflow to see what developers need, so we can keep improving the OCR engine.

    0 讨论(0)
  • 2021-01-22 18:59

    I have been working with Microsoft OCR for a while now. Compared with Tesseract it has very basic functionality.

    For example Microsoft OCR returns the words and lines. But the lines are nonsense. Randomly 2 or 3 words are grouped together as a "line" but they are not a real line. And the "lines" are completely unordered. In this aspect it is worse than Tesseract. You have to take the coordinates of each word and order them on your own.

    Microsoft does not return the rectangles of characters and there is absolutely no way to configure or train Microsoft OCR in any way. You can add languages with Windows Update for "Basic Typing" = OCR (see http://www.thewindowsclub.com/install-uninstall-languages-windows-10), but you cannot train your own language data.

    MSDN says that the following 25 languages are supported with different accuracy:

    • Excellent: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish and Swedish.
    • Very good: Chinese Simplified, Greek, Japanese, Russian and Turkish.
    • Good: Chinese Traditional and Korean.

    The recognition quality is very similar to Tesseract. It has even exactly the same problems as Tesseract. Some single characters are not recognized (separate symbols like a single '$') and it has the same huge problem with asterisks as Tesseract. Also does it insert spaces at the wrong places as Tesseract does. So I ask myself if Microsoft is using Tesseract under the hood?

    However Microsoft OCR has an advantage over Tesseract: The image preprocessing is much better. It does not matter if you have red text on yellow background or white text on black. This is a catch for Tesseract which needs a black and white image of good quality as input.

    For both OCR libraries applies: If you have recognition problems, try to amplify the image. Even blurring the image may be very helful because this removes the noise from the image.

    0 讨论(0)
提交回复
热议问题