Best way to recognize characters in screenshot?

为君一笑 提交于 2019-12-02 22:23:42

Since this is the first result on Google for tesseract recognize screenshot, let me do bit of necromancy and add a much simpler solution.

Tesseract expects images at around 300 dpi or more and standard dpi for Windows is 96. Which means you need to rescale the image to 300%. After that, the results improve dramatically.

100%


Result: Whal would you recommend for recognizing all characters from a screensnor 7

200%


Result: What would you recommend for recognizing all chamcters from a screenth ?

300%


Result: What would you recommend for recognizing all characters from a screenshot ?

Anything above 300% works just as well.

icyrock.com

I would be surprised if OCR would give so bad results on such a good quality input. Probably what you want to do is choose a font that has sharp edges, no anti-aliasing, bigger font size would also help.

Also, if acceptable, try the OCR font given in this SO question:

This should give you the best possible results - if this doesn't go 100%, then I don't know what will...

Don't know what you tried beside Tesseract, but if you did not, it might be worth trying some others. These seem to be updated recently (Tesseract was updated a year ago):

There are some online versions, too, such as:

that you can use to test a sample document. From this link:

it seems that you might need to go commercial to get what you want.

Hope this helps.

I know you already solved your problem, but in case this helps someone else: Two issues I found when dealing with screenshots is that OCR engines are sensitive to the following: (1) resolution incorrectly set in image file headers, and (2) transparency issues (what looks like white background is actually marked transparent). For some reason these problems tend to occur often in screenshot images.

Also, aside from Tesseract, another possibility is to try the API at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml based on the ABBYY OCR engine. (The advantage is that there's nothing to install/configure/etc to try it to make sure it will work on your images - just make an HTTP POST). Disclaimer: WiseTrend is my company's customer.

Do you have the option to change text anti-aliasing on the OS level? Playing around with those settings (or even trying to turn it off) might give you better result with existing OCRs too.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!