Tesseract running error

后端未结

关注

 11  1178

I have a problem with running tesseract-ocr engine on linux. I\'ve downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I\'m tryin

相关标签:

11条回答

無奈伤痛

2020-11-29 21:11

I'm using windows OS, I tried all solutions above and none of them work.

Finally, I install Tesseract-OCR on D drive(Where I run my python script from) instead of C drive and it works.

So, if you are using windows, run your python script in the same drive as your Tesseract-OCR.

0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-11-29 21:15
The simpliest way is to install the needed package:
```
sudo apt-get install tesseract-ocr-eng  #for english
sudo apt-get install tesseract-ocr-tam  #for tamil
sudo apt-get install tesseract-ocr-deu  #for deutsch (German)
```
As you can notice, it opens the road to others languages (i.e. tesseract-ocr-fra).
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-11-29 21:20

C# developer working on Windows here. What works for me is simply download the file eng.traineddata from the following URL:

https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata

and copy it to the following directory in my Console Application project:

[Project Directory]\bin\Debug\tessdata

I did manually create the tessdata folder above.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-11-29 21:24

I'm using Visual Studio 2017 Community Edition.
I solved this problem by making a directory called tessdata in the Debug directory of my project. Then I put the eng.traineddata file into said directory.

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-11-29 21:26
```
tesseract  --tessdata-dir <tessdata-folder> <image-path> stdout --oem 2 -l <lng>
```
In my case, the mistakes that I've made or attempts that wasn't a success.
- I cloned the github repo and copied files from there to
  - /usr/local/share/tessdata/
  - /usr/share/tesseract-ocr/tessdata/
  - /usr/share/tessdata/
- Used TESSDATA_PREFIX with above paths
- sudo apt-get install tesseract-ocr-eng
First 2 attempts did not worked because, the files from git clone did not worked for the reasons that I do not know. I am not sure why #3 attempt worked for me.

Finally,
1. I downloaded the eng.traindata file using wget
2. Copied it to some directory
3. Used --tessdata-dir with directory name
Take away for me is to learn the tool well & make use of it, rather than relying on package manager installation & directories
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘了有多久

2020-11-29 21:31

You can call tesseract API function from C code:

#include <tesseract/baseapi.h>
#include <tesseract/ocrclass.h>; // ETEXT_DESC

using namespace tesseract;

class TessAPI : public TessBaseAPI {
    public:
    void PrintRects(int len);
};

...
TessAPI *api = new TessAPI();
int res = api->Init(NULL, "rus");
api->SetAccuracyVSpeed(AVS_MOST_ACCURATE);
api->SetImage(data, w0, h0, bpp, stride);
api->SetRectangle(x0,y0,w0,h0);

char *text;
ETEXT_DESC monitor;
api->RecognizeForChopTest(&monitor);
text = api->GetUTF8Text();
printf("text: %s\n", text);
printf("m.count: %s\n", monitor.count);
printf("m.progress: %s\n", monitor.progress);

api->RecognizeForChopTest(&monitor);
text = api->GetUTF8Text();
printf("text: %s\n", text);
...
api->End();

And build this code:

g++ -g -I. -I/usr/local/include -o _test test.cpp -ltesseract_api -lfreeimageplus

(i need FreeImage for picture loading)

0 讨论(0)

1 2 下一页