问题
I know ICR is basically used for handwritten(hand printed) data recognition but can we leverage ICR to extract distorted(bad quality) machine printed text by any chance ?
if not what is best way to solve the following problem
I have a unstructured document which may run into 2 or more pages, with in the document there are few date field which will be handwritten.now I want to convert this to text file. I have tried some fullpage ocr(omnipage and abbyy etc) tools which have ICR modules to convert into text file. they are good at full page OCR but when it encounter handwritten date it puts junk character instead of using ICR module there. I don't want go with form processing tools like parascript and A2ia which are position based and they work only with structured document.
or can we use ICR to convert machine printed text and handwritten(anyway it will work for hand return date in this case)
here my aim is to get the text file output from unstructured document with few hand written text(like dates,numbers )
回答1:
I have tried some fullpage ocr(omnipage and abbyy etc) tools which have ICR modules
That is incorrect, which explains the poor result. If you tried retail versions of OmniPage and ABBYY FineReader, these software packages are OCR only, without ICR support.
I don't want go with form processing tools
You may have to in some way, but there are a few variations of the approach. This will have to be a marriage of two technologies, either out-of-box, or self-created, but it will take more effort than just install and run it.
Today, it is assumed that there is no unstructured text ICR software that can deliver high quality result. Full-page OCR or unstructured text OCR (machine text) produces high quality result on machine text, and garbage on hand-writing. You are right that ICR implies zonal recognition, which allows to provide data types and backend dictionaries for improved recognition of hand-writing.
For the simplest and fastest approach, which may may also be most economical and least labor intensive, I would use an unstructured form-processing package, such as ABBYY FlexiCapture (http://www.wisetrend.com/abbyy_flexicapture.shtml). It requires some non-programming setup to 'locate' zones. Zones may change position and this software still finds them, and then uses appropriate algorithm (OCR/ICR) to read zones content. Supports OCR, ICR, OMR (checkmarks), BCR (barcode). Also has built-in full page OCR. I use this software in-house, resell it, and have over 14 years of experience fine-tuning it.
For a potentially more economical way, but one that may require manual marriage of at least two technologies (two purchases instead of one plus labor - may not be most economical at end of day), I would use some kind of OCR SDK for machine text, and some kind of ICR-capable SDK for hand-written zones. Depending on consistency in location of those zones, you may be able just to supply coordinates. If they shift, then need to do deeper analysis of zones location to pass them to ICR. ICR-recognized text will need to be returned to be inserted into appropriate places among OCRed text.
In my opinion, with a number of tools that can do that out of box now, I would use something out of box instead of writing it myself because there are several major challenges that need to be solved: zone identification, two technologies integration, workflow. We have done such integration some years ago when current tools were not available.
来源:https://stackoverflow.com/questions/16078393/icr-for-machine-printed-text