Train Tesseract to label icons

馋奶兔 提交于 2020-12-29 04:55:59

问题


I'm trying to create training data for Tesseract 4.0 to identify icons (like, comment, share, save) in screenshots. This is a sample screenshot:
sample screenshot

I would like to fine tune the Tesseract to achieve output as below:
Like 147
Comment 29
Saved 5
Actions
58
Actions
Profile Visits 24
Follows 2

I have followed step-by-step as stated in https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/

I modified the box file as below:
- Heart : Like
- Speech bubble: Comment
- Bookmark: Saved
- Arrow: Share

But, the final training data failed to read the icon as I wanted. Example of error I've got is 'Like is not in unicharset'. Do I have to do something different when creating the unicharset for icons?


回答1:


I've figured it out. The box editor expects single letter/number instead of full words. I have used Unicode character to interpret my icons. The steps are as below:

  1. Crop all target icons that you wish for Tesseract to detect and save it in one file named as (in my case) own.std.exp0.png
  2. Create box file using the command 'tesseract own.std.exp0.png own.std.exp0 makebox'
  3. Open jTessBoxEditor and input unicode at the char column. The list of supported unicode can be found under program Character Map (https://sites.psu.edu/symbolcodes/windows/charmap/). Example: For heart symbol I used U+2665. Note that some unicode are not supported. It shows as blank square. So, keep trying till you find one that works. My final edited box file looks like this.
  4. Create the final training file which will be own.trainneddata (can be done as shown here https://medium.com/apegroup-texts/training-tesseract-for-labels-receipts-and-such-690f452e8f79 or train using jTessBoxEditor).
  5. Copy the own.traineddata to the directory Tesseract/tessdata and run Tesseract using lang='own+eng'. I used pytesseract and the output is as below:


来源:https://stackoverflow.com/questions/57995023/train-tesseract-to-label-icons

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!