How do I increase/decrease the strength of the dictionary in tesseract 3 ?
In the FAQ it says I need to change the value of \"NON_WERD\" and \"GARBAGE_STRING\" but they
According to http://code.google.com/p/tesseract-ocr/wiki/FAQ, you change these variables:
enable_new_segsearch 1
language_model_penalty_non_freq_dict_word 0.2
language_model_penalty_non_dict_word 0.3
Increase their values to make Tesseract more biased to dictionary words.
Note: You must set enable_new_segsearch
, otherwise they'll have no effect.
To turn tesseract's language-knowing abilities entirely, run each of these:
tess.setTessVariable("load_system_dawg", "false");
tess.setTessVariable("load_freq_dawg", "false");
tess.setTessVariable("load_punc_dawg", "false");
tess.setTessVariable("load_number_dawg", "false");
tess.setTessVariable("load_unambig_dawg", "false");
tess.setTessVariable("load_bigram_dawg", "false");
tess.setTessVariable("load_fixed_length_dawgs", "false");
Or, for finer control, just some of them. (I don't know of a place explaining well what they all do, but the names are pretty explanatory) This is code from my current project, using Tess4J, but you can easily translate them to c++ or a config file or whatever else you need.