Tesseract OCR not working for 64 bit machine

问题

I am working on an application in which I am using Tesseract for OCR.

My code is working absolutely fine in windows 32 bit system. But when I try to run the same code in 64 bit machine using the 32 bit .dll files, the code is running but then the code is not giving the accurate results.

So I am running it in 64 bit machine using the 64 bit .dll files. Now when I tried to run the same program, I got the following error in Console(Eclipse Kepler).

Exception in thread "AWT-EventQueue-0" java.lang.UnsatisfiedLinkError: %1 is not a                                                           
valid Win32 application.
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.Native.open(Native.java:1759)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
at com.sun.jna.Library$Handler.<init>(Library.java:147)
at com.sun.jna.Native.loadLibrary(Native.java:412)
at com.sun.jna.Native.loadLibrary(Native.java:391)
at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:38)
at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:293)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:227)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:176)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:159)

I have downloaded the 64 bit .dll files (https://github.com/charlesw/tesseract/tree/master/src/lib/TesseractOcr/x64) compatible with 64 bit system but still i am getting the same error. I am using GhostScript v-8.71 on 64 bit machine. I have installed this in both Program Files and Program Files(x86). I have also set the environment variables accordingly.But still not working.

Please please provide me with some solution!

回答1:

I don't see what this has to do with Ghostscript.

回答2:

Tess4J only currently supports 32-bit JVM

This is the creator, nguyenq, responding to a similar issue on a sourceforge forum.

Similarly, in the tutorial it points out that only 32-bit DLL's are included in the distro.

To run with a JVM 64-bit, you'll need to use Tesseract and Leptonica 64-bit DLLs.

One solution: Tell your IDEto use a 32-bit JVM instead.

-- downside is that you may be mixing 32 bit and 64 bit environments, in a complicated app or env this could be odd... (I don't think it's too bad, but might be a pain in your IDE)

In an update found here, it seems you can find DLL's for 64-bit Java here, as part of the Tesseract wrapper for .NET (oddly enough). However, I haven't tried out those 64-bit DDL's yet and in the sourceforge link, it says they depend on the Visual C++ Redistributable for VS2012 or Visual C++ Redistributable for VS2013 ... which sucks....

I'll update this post if I figure out a cleaner solution.

UPDATE

Note that I did this working with Amazon Web Services instances.

I was able to get Tess4J to work on a 64-bit Ubuntu 14.04. It was actually very simple once I gave up on my Red Hat distro and went to Ubuntu.

sudo apt-get install tesseract-ocr will get tesseract set up completely. You can check by typing tesseract -v. I also needed GhostScript because I was working with PDF's. sudo apt-get install ghostscript again got everything set up. Verify with gs -v.

Now in your Java app, all you need to include are the JAR's from Tess4J's download in your path -- jna-4.1.0j.ar, jai_imageio.jar, tess4j.jar, and ghost4j-0.5.1.jar if you are working with PDF.

In your Java app, you need to set the data path so your Tesseract instance knows where tesseract is installed. Even while I had the environment variable set, it never worked for me. I needed to explicitly set the data path like so:

Tesseract tessInstance = Tesseract.getInstance();
tessInstance.setDatapath(System.getenv("TESSDATA_PREFIX"));
ImageIO.scanForPlugins(); // make sure it knows about GhostScript, to work with PDFs
String result = tessInstance.doOCR(myFile);

Be sure that setDatapath() sets to the parent folder of the tessdata folder of your tesseract installation (on my Ubuntu this was /usr/share/tesseract-ocr/`).

That was all I needed. No worrying about DLL's in class path.

tl;dr:

Use up-to-date Ubuntu

sudo apt-get tesseract-ocr

sudo apt-get ghostscript if working with PDF

include proper Tess4J JAR's (jna-4.1.0j.ar, jai_imageio.jar, tess4j.jar, and ghost4j-0.5.1.jar if you are working with PDF)

call tess.setDataPath() to point to your tesseract installation (/usr/share/tesseract-ocr/ for my Ubuntu 14.04)

ImageIO.scanForPlugins() if using GhostScript

That's it. You are good to go call tess.doOCR(MyFile) happily

来源：https://stackoverflow.com/questions/24184422/tesseract-ocr-not-working-for-64-bit-machine

标签

java

32bit-64bit

tesseract