What is the ideal font for OCR?

后端 未结 8 1024
灰色年华
灰色年华 2020-12-24 13:39

Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E\'n different font

相关标签:
8条回答
  • 2020-12-24 13:42

    I had always success by simply using times new roman..

    0 讨论(0)
  • 2020-12-24 13:42

    Currently using Monospace. Tried very many fonts, but this is the most accurate one for me.

    0 讨论(0)
  • 2020-12-24 13:48

    I've been doing extensive testing in this recently in an ECM called Laserfiche, which uses Nuance OmniPage, and I've found that monospace fonts perform poorly compared to dynamically spaced fonts. Those old OCR fonts don't perform as well as more 'normal' looking fonts. Especially for strings of numbers at smaller font sizes like point 12.

    It's strange that someone else is having success with Calibri. It performed very poorly in my tests, routinely getting similar looking letters and numbers confused for each other. The best fonts (among those that come on a Windows computer with Office installed) were Consolas, Verdana, and Book Antiqua. All dynamic serif fonts where letters and numbers looked distinct. Consolas was the champion.

    0 讨论(0)
  • 2020-12-24 13:50

    After trying a lot of different fonts and OCR engines I tend to get the best results using Consolas. It is a monospaced typeface like OCR-A, but easier to read for humans. Consolas is included in several Microsoft products.

    There is also an open source font Inconsolata, which is influenced by Consolas. Inconsolata is a good replacement for Consolas, especially considering the licensing details.

    In my tests, the numbers and spaces in the Calibri font were not always recognized properly. OCR-A gave lots of reading errors. I did not give MIRC a try, since it is not easily readable for most humans.

    Note: tesseract requires a lot of testing and fine-tuning before being reliable. In our case we switched to a commercially licensed OCR engine (ABBYY), especially since reliability was very important and we needed to support multiple (European) languages.

    Update: 2017 Jan 31 - Changed 'based on Consolas' to 'influenced by Consolas' due to potential copyright issues.

    0 讨论(0)
  • 2020-12-24 13:52

    It really depends on the OCR engine considered.

    For gocr, FreeMono is the best, see gocr documentation.

    For tesseract, DejaVu-Serif works well, see https://superuser.com/a/1543382/280936

    For abbyocr, verdana is good, see this comparison

    See also this wrap-up: https://www.monperrus.net/martin/perfect-ocr-digital-data

    0 讨论(0)
  • 2020-12-24 13:56

    I'd probably use the same font that banks use for the routing numbers at the bottom of checks:

    http://morovia.com/font/micr.asp

    It was specifically designed to be unambiguously machine-readable.

    0 讨论(0)
提交回复
热议问题