发表新帖

发表新帖

Train Tesseract to label icons

前端未结

关注

 1  866

囚心锁ツ 2021-02-11 09:52

I\'m trying to create training data for Tesseract 4.0 to identify icons (like, comment, share, save) in screenshots. This is a sample screenshot:

1条回答

猫巷女王i (楼主)

2021-02-11 10:54
I've figured it out. The box editor expects single letter/number instead of full words. I have used Unicode character to interpret my icons. The steps are as below:
1. Crop all target icons that you wish for Tesseract to detect and save it in one file named as (in my case) own.std.exp0.png
2. Create box file using the command 'tesseract own.std.exp0.png own.std.exp0 makebox'
3. Open jTessBoxEditor and input unicode at the char column. The list of supported unicode can be found under program Character Map (https://sites.psu.edu/symbolcodes/windows/charmap/). Example: For heart symbol I used U+2665. Note that some unicode are not supported. It shows as blank square. So, keep trying till you find one that works. My final edited box file looks like this.
4. Create the final training file which will be own.trainneddata (can be done as shown here https://medium.com/apegroup-texts/training-tesseract-for-labels-receipts-and-such-690f452e8f79 or train using jTessBoxEditor).
5. Copy the own.traineddata to the directory Tesseract/tessdata and run Tesseract using lang='own+eng'. I used pytesseract and the output is as below:
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题