I have tried tess4j as a standalone java program and it worked properly giving the text output.
Now i am trying to create a spring mvc web project adding the dependencie
Even I faced the similar problem of using tess4j
for DynamicWebProject
. But thanks to comment by @nguyenq that helped me I got it working.
Mostly tess4j uses TIFF handler for optical recognition. The dependencies required for it are not available with default ImageIO.
So, jai-imageio.jar is required. All I did was added line ImageIO.scanForPlugins()
before I called the wrapper class that performed doOCR
.
I had following jars in my lib:-
tess4j.jar
jai_imageio.jar
ghost4j-0.3.1.jar
jna.jar
junit-4.10.jar
Here's the sample code:
TessractOCR tessocr = new TessractOCR();
ImageIO.scanForPlugins();
String extractedString = tessocr.extractTextFromImage(binarizrImage);
The function
public static String extractTextFromImage(BufferedImage image){
RenderedImage img = image;
String result =null;
try {
File outputfile = new File("saved.png");
ImageIO.write(img, "png", outputfile);
Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping
instance.setDatapath("E:\\OCR-data\\Tess4J-1.2-src\\Tess4J");
result = instance.doOCR(outputfile);
System.out.println(result);
} catch (Exception e) {
System.err.println(e.getMessage());
}
return result;
}
It works 100% :)
Below is the working code sharing for all:
public static String doOCR(File pdfInvoice) {
String result = "";
long totalTime = 0;
long endTime = 0;
long startTime = System.currentTimeMillis();
File imageFile = new File("D:\\docfolder\\9011121584.pdf");
Tesseract instance = Tesseract.getInstance(); //
try {
ImageIO.scanForPlugins();
result = instance.doOCR(imageFile);
endTime = System.currentTimeMillis();
totalTime = endTime - startTime;
System.out.println("Total Time Taken For OCR: " + (totalTime / 1000));
return result;
} catch (Exception e) {
System.err.println(e.getMessage());
result = "";
return result;
}
}