I am working on a Android application using real-time OCR. I using OpenCV and Tesseract Library. But the performance is very poor, even on my Galaxy SIII. There are any meth
You can have Tesseract only do the recognition pass 1, so that it skips passes 2 through 9, when it calls recog_all_words().
Change the following line in baseapi.cpp and rebuild your Tesseract library project:
if (tesseract_->recog_all_words(page_res_, monitor, NULL, NULL, 0)) {
Change it to:
if (tesseract_->recog_all_words(page_res_, monitor, NULL, NULL, 1)) {
Use multithreading, but be aware to create one instance per thread for TessBaseAPI. Don't share them between different threads. Create N threads (N >= number of cores), and java will make sure that you speed up at least the number of cores times.
What I do is creating N threads which create TessBaseAPI objects in their own context (in the run method) and wait for OCR requests in a loop until interrupted.
...
...
@Override
public void run() {
TessBaseAPI tessBaseApi = new TessBaseAPI();
tessBaseApi.init(Ocrrrer.DATA_PATH, "eng");
setTessVariable(tessBaseApi, "load_system_dawg", "0");
setTessVariable(tessBaseApi, "load_freq_dawg", "0");
setTessVariable(tessBaseApi, "load_unambig_dawg", "0");
setTessVariable(tessBaseApi, "load_punc_dawg", "0");
setTessVariable(tessBaseApi, "load_number_dawg", "0");
setTessVariable(tessBaseApi, "load_fixed_length_dawgs", "0");
setTessVariable(tessBaseApi, "load_bigram_dawg", "0");
setTessVariable(tessBaseApi, "wordrec_enable_assoc", "0");
setTessVariable(tessBaseApi, "tessedit_enable_bigram_correction", "0");
setTessVariable(tessBaseApi, "assume_fixed_pitch_char_segment", "1");
setTessVariable(tessBaseApi, TessBaseAPI.VAR_CHAR_WHITELIST, "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ<");
Log.d(TAG, "Training file loaded");
while (!interrupted()) {
reentrantLock.lock();
try {
Log.d(TAG, this.getName() + " wait for OCR");
jobToDo.await();
Log.d(TAG, this.getName() + " input arrived. Do OCR");
this.ocrResult = doOcr(tessBaseApi);
ocrDone.signalAll();
} catch (InterruptedException e) {
return;
} finally {
try {
reentrantLock.unlock();
} catch (Exception ex) {
}
}
}
}
...
...
You can see that the tessBaseApi object is local to the run method, hence absolutely not shared.
One thing to try is to binarize the image using adaptive thresholding (adaptiveThreshold in OpenCV).
Some things that might make it faster are: