Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs grea
Resources I used: Windows 10 (tried on Windows Server 2016 as well), JAVA, MAVEN
Status: Working good on my local as well as VM
1. Download Tess4J-3.4.8 from here http://tess4j.sourceforge.net/ and set your ENV variable path under Advance System Setting
2. Get repo from MAVEN -
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.1</version>
</dependency>
<dependency>
<groupId>org.ghost4j</groupId>
<artifactId>ghost4j</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.7.0</version>
</dependency>
3. Get libtesseract302.dll and copy to "C:\Windows\System32" folder
from here http://api.256file.com/libtesseract302.dll/en-download-56466.html
do not forget to set your ENV variable path under Advance System Setting
4. Download and install Visual C++ 2015 Redistributable or VC++ 2017 Redistributable (I installed both )
from here https://programmer.help/blogs/net.sourceforge.tess4j.tesseractexception-java.lang.nullpointerexception.html
then do restart your PC
5. on Safer side can have some Jar files if you dont have already in local - Please see image
do not forget to set your ENV variable path for JARs under Advance System Setting
As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.
My guess is that there is GhostscriptException which is not logged properly, and this is causing NullPointerException:
https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107
106 } catch (GhostscriptException e) {
107 logger.error(e.getCause().toString(), e);
108 } finally {
In line 107 - e.getCause() is (probably) null, calling null.toString() throws NPE.
(from the specs - getCause can be null: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause(), GhostscriptException is also allowing the cause to be null: http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java)
To verify this answer (without recompiling the whole tess4j) you could start your program in the debug mode and put a breakpoint at line 107. This will give you information about the real Exception.