For the past 3 months I\'ve been trying to train the Tesseract
With identifying a collection of images I\'ve had, due a real lack
of proper documentation, and very hig
You can use jTessBoxEditor to edit the box files you generate. Bundled with it is a PowerShell script to automate box file and final .traineddata file generation.
I have trained tesseract 2.04 after 1 month efforts for OCR A extended font.
Its working very good and showing above 90 Accuracy with font size 14.
I suggest don't give up tesseract.
Please can you explain your problem's following points.
Based on your comment, all you need is to scan relatively small amount of documents with almost 100% accuracy and your budget is about 200$
Well, the answer is simple then. You don't need any programming solution. Just buy quality commercial OCR product, f.e. ABBYY FineReader (disclaimer: I work for ABBYY). It has different prices in different regions, but I guess it is somewhere in about your budget.
Commercial desktop OCR product will provide you out-of-the box almost 100% accuracy on typical languages. Also they have convenient manual verification tools to fix all remaining errors. Typically they support whole variety of modern fonts, but if your font is not trivial, they do have font training utility for that.
I do think that is optimal solution for you.
UPDATE: Linux platform. Unfortunately, there is almost no choice of high quality OCR products for Linux, sorry. The only one I know is from ABBYY: http://ocr4linux.com/en:start but it does not have UI, verification and font training. But at least you can give it a try to see if it will give you good enough accuracy as it is, which may happen to be the case.