JAVA Tess4j doOCR() not working, Exception “Invalid memory access”

I'm working in dynamic web project in eclipse, I made a TesseractOCR class that contain:

public class TesseractOCR {

    public TesseractOCR()
    {
    }

    public String doOCR(String file)
    {
         System.setProperty("jna.library.path", "32".equals(System.getProperty("sun.arch.data.model")) ? "lib/win32-x86" : "lib/win32-x86-64");

            File imageFile = new File("C:\\Users\\Sherein Dabbah\\Downloads\\ca096-d7a6d799d7a1d798d799d7a72.jpg");
            Tesseract instance = Tesseract.getInstance();  // JNA Interface Mapping
            Tesseract1 instance1 = new Tesseract1();
            instance.setLanguage("heb+eng");
            // Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping
            // File tessDataFolder = LoadLibs.extractTessResources("tessdata"); // Maven build bundles English data
            // instance.setDatapath(tessDataFolder.getAbsolutePath());
            String sub ="";
            try {
                String result = instance.doOCR(imageFile);
                int indx1 = 6+result.indexOf("אבחנות");
                int indx2 = result.indexOf("הפניות");
                sub = result.substring(indx1,indx2-1);
                System.out.println(sub);
            } catch (Exception e) {
                System.err.println(e.getMessage());
            }

            return sub;
    }
}

while there's a servlet that contain function doPost()

protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {

         System.setProperty("jna.library.path", "32".equals(System.getProperty("sun.arch.data.model")) ? "lib/win32-x86" : "lib/win32-x86-64");

         response.setContentType("text/html;charset=UTF-8");

            // Create path components to save the file
            final String path = "C:\\Users\\Sherein Dabbah\\Desktop\\med"; //request.getParameter("destination");
            final Part filePart = request.getPart("file");
            final String fileName = filePart.getSubmittedFileName();

            OutputStream out = null;
            InputStream filecontent = null;
            PrintWriter writer = response.getWriter();

            if(fileName == ""){
                writer.println("You either did not specify a file to upload or are "
                        + "trying to upload a file to a protected or nonexistent "
                        + "location.");
                return;
            }

            String fullName = path + File.separator+ fileName;

            try {
                File newFile = new File(fullName);
                out = new FileOutputStream(newFile);
                filecontent = filePart.getInputStream();

                int read = 0;
                final byte[] bytes = new byte[1024];

                while ((read = filecontent.read(bytes)) != -1) {
                    out.write(bytes, 0, read);
                }

                writer.println("New file " + fileName + " created at " + path);
                LOGGER.log(Level.INFO, "File{0}being uploaded to {1}", 
                        new Object[]{fileName, path});

            } catch (FileNotFoundException fne) {
                writer.println("You either did not specify a file to upload or are "
                        + "trying to upload a file to a protected or nonexistent "
                        + "location.");
                writer.println("<br/> ERROR: " + fne.getMessage());

                LOGGER.log(Level.SEVERE, "Problems during file upload. Error: {0}", 
                        new Object[]{fne.getMessage()});
            } finally {
                if (out != null) {
                    out.close();
                }
                if (filecontent != null) {
                    filecontent.close();
                }
                if (writer != null) {
                    writer.close();
                }
            }

            String s = new TesseractOCR().doOCR(fullName);
            System.out.println(s);
        }

I have an exception:

   Sep 06, 2015 10:36:46 AM org.apache.catalina.core.StandardWrapperValve invoke
     SEVERE: Servlet.service() for servlet [servlets.UploadServlet] in context   with path [/up] threw exception [Servlet execution threw an exception] with root      cause
    java.lang.Error: Invalid memory access
    at com.sun.jna.Native.invokePointer(Native Method)
    at com.sun.jna.Function.invokePointer(Function.java:470)
    at com.sun.jna.Function.invoke(Function.java:404)
    at com.sun.jna.Function.invoke(Function.java:315)
    at com.sun.jna.Library$Handler.invoke(Library.java:212)
    at com.sun.proxy.$Proxy4.TessBaseAPIGetUTF8Text(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
    at classes.TesseractOCR.doOCR(TesseractOCR.java:28)
    at servlets.UploadServlet.doPost(UploadServlet.java:111) 
    at...

It fails at line:

String result = instance.doOCR(imageFile); in TesseractOCR class

You probably would need to call setDatapath to tell it where to find the tessdata folder for .traineddata files.

Also, you may no longer need to set jna.library.path variable as tess4j now can auto-extract and load the native libraries.

In this case selection of language also matters - I was processing image with lang=hin+eng, but it was giving the same error (mentioned in this post)

As English text was less in the image so I changed lang=hin and I got the expected result.

public static void main(String[] args) {
        Tesseract in = new ReadImageText().getTesseractInstance("C:/Program Files (x86)/Tesseract-OCR/tessdata/", "hin");
        try {
            String resultText = in.doOCR(new File("C:/EA/app-result/im/01-001/34/0.png"));
            log.info("resultText {}", resultText);
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

来源：https://stackoverflow.com/questions/32421492/java-tess4j-doocr-not-working-exception-invalid-memory-access

标签

java

eclipse

servlets

tesseract

tess4j