Tesseract running error

后端 未结 11 1178
情书的邮戳
情书的邮戳 2020-11-29 21:08

I have a problem with running tesseract-ocr engine on linux. I\'ve downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I\'m tryin

相关标签:
11条回答
  • 2020-11-29 21:11

    I'm using windows OS, I tried all solutions above and none of them work.

    Finally, I install Tesseract-OCR on D drive(Where I run my python script from) instead of C drive and it works.

    So, if you are using windows, run your python script in the same drive as your Tesseract-OCR.

    0 讨论(0)
  • 2020-11-29 21:15

    The simpliest way is to install the needed package:

    sudo apt-get install tesseract-ocr-eng  #for english
    sudo apt-get install tesseract-ocr-tam  #for tamil
    sudo apt-get install tesseract-ocr-deu  #for deutsch (German)
    

    As you can notice, it opens the road to others languages (i.e. tesseract-ocr-fra).

    0 讨论(0)
  • 2020-11-29 21:20

    C# developer working on Windows here. What works for me is simply download the file eng.traineddata from the following URL:

    https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata

    and copy it to the following directory in my Console Application project:

    [Project Directory]\bin\Debug\tessdata

    I did manually create the tessdata folder above.

    0 讨论(0)
  • 2020-11-29 21:24

    I'm using Visual Studio 2017 Community Edition.
    I solved this problem by making a directory called tessdata in the Debug directory of my project. Then I put the eng.traineddata file into said directory.

    0 讨论(0)
  • 2020-11-29 21:26
    tesseract  --tessdata-dir <tessdata-folder> <image-path> stdout --oem 2 -l <lng>
    

    In my case, the mistakes that I've made or attempts that wasn't a success.

    • I cloned the github repo and copied files from there to
      • /usr/local/share/tessdata/
      • /usr/share/tesseract-ocr/tessdata/
      • /usr/share/tessdata/
    • Used TESSDATA_PREFIX with above paths
    • sudo apt-get install tesseract-ocr-eng

    First 2 attempts did not worked because, the files from git clone did not worked for the reasons that I do not know. I am not sure why #3 attempt worked for me.

    Finally,

    1. I downloaded the eng.traindata file using wget
    2. Copied it to some directory
    3. Used --tessdata-dir with directory name

    Take away for me is to learn the tool well & make use of it, rather than relying on package manager installation & directories

    0 讨论(0)
  • 2020-11-29 21:31

    You can call tesseract API function from C code:

    #include <tesseract/baseapi.h>
    #include <tesseract/ocrclass.h>; // ETEXT_DESC
    
    using namespace tesseract;
    
    class TessAPI : public TessBaseAPI {
        public:
        void PrintRects(int len);
    };
    
    ...
    TessAPI *api = new TessAPI();
    int res = api->Init(NULL, "rus");
    api->SetAccuracyVSpeed(AVS_MOST_ACCURATE);
    api->SetImage(data, w0, h0, bpp, stride);
    api->SetRectangle(x0,y0,w0,h0);
    
    char *text;
    ETEXT_DESC monitor;
    api->RecognizeForChopTest(&monitor);
    text = api->GetUTF8Text();
    printf("text: %s\n", text);
    printf("m.count: %s\n", monitor.count);
    printf("m.progress: %s\n", monitor.progress);
    
    api->RecognizeForChopTest(&monitor);
    text = api->GetUTF8Text();
    printf("text: %s\n", text);
    ...
    api->End();
    

    And build this code:

    g++ -g -I. -I/usr/local/include -o _test test.cpp -ltesseract_api -lfreeimageplus
    

    (i need FreeImage for picture loading)

    0 讨论(0)
提交回复
热议问题