I want to know what parameters the config file used by Tesseract OCR accepts, how to write a config file, etc.
I can\'t find any documentation about this on their site.
It's just a plain text file containing space-delimited key/value pairs for Tesseract config variables, each on separate line; for instance:
interactive_display_mode T
tessedit_display_outwords T
There are several standard config files -- such as digits, hocr -- under Tesseract tessdata/configs folder.
Tesseract v3.04 now offers the command line option --print-parameters
, so you can call tesseract --print-parameters
to get a list of the 678 (!) configurable parameters, their default values, and a short description:
Tesseract parameters:
editor_image_xpos 590 Editor image X Pos
editor_image_ypos 10 Editor image Y Pos
editor_image_menuheight 50 Add to image height for menu bar
editor_image_word_bb_color 7 Word bounding box colour
editor_image_blob_bb_color 4 Blob bounding box colour
editor_image_text_color 2 Correct text colour
...and many, many more
I found these instructions in the link below. They are about writing the config file and where to place it:
config file is simple text file without BOM and with Unix end-of-line mark (on Windows you can use some advanced text editor e.g. Notepad++ to achieve this).
If you use tesseract executable this is only way how to change tesseract parameters.
config file should be located in your tessdata/configs directory. Have a look there for some examples.
There is a list of all the variables plus descriptions of each one in http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version. Note it's for Tesseract 3.02, things may be different in other versions.
Edit: Also adding a pastebin link in case the above link becomes dead.