What is the ideal font for OCR?

后端未结

关注

 8  1033

Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E\'n different font

相关标签:

8条回答

野趣味

2020-12-24 13:42

I had always success by simply using times new roman..

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-12-24 13:42

Currently using Monospace. Tried very many fonts, but this is the most accurate one for me.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2020-12-24 13:48

I've been doing extensive testing in this recently in an ECM called Laserfiche, which uses Nuance OmniPage, and I've found that monospace fonts perform poorly compared to dynamically spaced fonts. Those old OCR fonts don't perform as well as more 'normal' looking fonts. Especially for strings of numbers at smaller font sizes like point 12.

It's strange that someone else is having success with Calibri. It performed very poorly in my tests, routinely getting similar looking letters and numbers confused for each other. The best fonts (among those that come on a Windows computer with Office installed) were Consolas, Verdana, and Book Antiqua. All dynamic serif fonts where letters and numbers looked distinct. Consolas was the champion.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-12-24 13:50

After trying a lot of different fonts and OCR engines I tend to get the best results using Consolas. It is a monospaced typeface like OCR-A, but easier to read for humans. Consolas is included in several Microsoft products.

There is also an open source font Inconsolata, which is influenced by Consolas. Inconsolata is a good replacement for Consolas, especially considering the licensing details.

In my tests, the numbers and spaces in the Calibri font were not always recognized properly. OCR-A gave lots of reading errors. I did not give MIRC a try, since it is not easily readable for most humans.

Note: tesseract requires a lot of testing and fine-tuning before being reliable. In our case we switched to a commercially licensed OCR engine (ABBYY), especially since reliability was very important and we needed to support multiple (European) languages.

Update: 2017 Jan 31 - Changed 'based on Consolas' to 'influenced by Consolas' due to potential copyright issues.

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-24 13:52

It really depends on the OCR engine considered.

For gocr, FreeMono is the best, see gocr documentation.

For tesseract, DejaVu-Serif works well, see https://superuser.com/a/1543382/280936

For abbyocr, verdana is good, see this comparison

See also this wrap-up: https://www.monperrus.net/martin/perfect-ocr-digital-data

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2020-12-24 13:56

I'd probably use the same font that banks use for the routing numbers at the bottom of checks:

http://morovia.com/font/micr.asp

It was specifically designed to be unambiguously machine-readable.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页