How to generate a tiff/box file from an image to train Tesseract in Windows

可紊 提交于 2020-02-01 19:57:27

问题


I'm trying to train Tesseract in Windows and for that I need a pair tiff/box file and I'm trying to create it using jTessBoxEditor but it doesn't accept images as input. I've also tried boxFactory but it doesn't run properly. Does anyone know what is the best tool to create the pair from images?

Thanks


回答1:


If you have jTessBoxEditor, then you have Tesseract bin files. Go to the tesseract-ocr subfolder of jTessBoxEditor and run the following command :

tesseract.exe D:\testocr\TestImage.tif D:\testocr\TestImage batch.nochop makebox

It should generate the file D:\testocr\TestImage.box. Then in jTessBoxEditor, go to Box Editor tab and open your image. The box file is automatically loaded, you can check if everything is ok and correct possible mistakes.




回答2:


I had this same kind of problem with being unable to properly open images with jTessBoxEditor in order to work with their boxes. I realized that one essential component is that the name of the .tif image and the name of the .box file must be identical, except for the different extensions. Without this, jTessBoxEditor won't be able to know which box file goes with which image. Thus, using the syntax suggested by darkpotpot above, then making sure the two file names match like indicated, then clicking on the "open" button in the Box Editor tab of jTessBoxEditor should work.



来源:https://stackoverflow.com/questions/31751402/how-to-generate-a-tiff-box-file-from-an-image-to-train-tesseract-in-windows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!