How to Create Traineddata file For Tesseract 4.1.0

空扰寡人 提交于 2019-12-01 00:52:53
Lukman Mhd

Creating .traineddata for Tesseract 4

{*Note : After install tesseract open cmd and do the following.}

Step 1: Make box files for images that we want to train

Syntax:

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox

Eg:

tesseract own.arial.exp0.jpg own.arial.exp0 batch.nochop makebox

{*Note:After making box files we have to change or modify wrongly identified characters in box files.}

Step 2: Create .tr file (Compounding image file and box file)

Syntax:

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train

Eg: tesseract own.arial.exp0.jpg own.arial.exp0 box.train

step 3: Extract the charset from the box files (Output for this command is unicharset file)

Syntax:

unicharset_extractor [langname].[fontname].[expN].box 

Eg:

unicharset_extractor  own.arial.exp0.box

step 4: Create a font_properties file based on our needs.

Syntax:

echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" > font_properties 

Eg:

echo "arial 0 0 1 0 0" > font_properties

Step 5: Training the data.

Syntax:

mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr

Eg:

mftraining -F font_properties -U unicharset -O own.unicharset own.arial.exp0.tr

Step 6:

Syntax:

cntraining [langname].[fontname].[expN].tr

Eg:

cntraining own.arial.exp0.tr

{*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) }

Step 7: Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)

Syntax:

rename filename1 filename2

Eg:

    rename shapetable own.shapetable
    rename inttemp own.inttemp
    rename pffmtable own.pffmtable
    rename normproto own.normproto

Step 8: Create .traineddata file

Syntax:

combine_tessdata [langname].

Eg:

combine_tessdata own.

{ *Note : I will use only one image exp0 for creating traineddata.if you want to train more than one image you can train i.e exp1,exp2..expn }

Reference

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!