Does anybody have a suggestion for a java library that performs automatic cropping and deskewing of images (like those retrieved from a flatbed scanner)?
I wrote a note that simple port of a very good deskewer. It works best if you have some text in the image.
Deskewing
Take a look at Tess4j (Java JNA wrapper for Tesseract).
You can combine ImageDeskew.getSkewAngle() with ImageHelper.rotate(BufferedImage image, double angle).
There is an example on how to use it on the test folder of the tess4j project Tesseract1Test.java
public void testDoOCR_SkewedImage() throws Exception {
logger.info("doOCR on a skewed PNG image");
File imageFile = new File(this.testResourcesDataPath, "eurotext_deskew.png");
BufferedImage bi = ImageIO.read(imageFile);
ImageDeskew id = new ImageDeskew(bi);
double imageSkewAngle = id.getSkewAngle(); // determine skew angle
if ((imageSkewAngle > MINIMUM_DESKEW_THRESHOLD || imageSkewAngle < -(MINIMUM_DESKEW_THRESHOLD))) {
bi = ImageHelper.rotateImage(bi, -imageSkewAngle); // deskew image
}
String expResult = "The (quick) [brown] {fox} jumps!\nOver the $43,456.78 <lazy> #90 dog";
String result = instance.doOCR(bi);
logger.info(result);
assertEquals(expResult, result.substring(0, expResult.length()));
}
I'd imagine that someone has built a library on top of the Java Advanced Imaging API for doing this. You could try Googling for "Java Advanced Imaging deskew".
I've written a simple image deskew app, includes source. Available at:
http://www.recognition-software.com/image/deskew/
ImageMagick can do that; you can use the ImageMagick Java bindings. The auto-crop operator is probably what you're looking for. Automatic deskewing is a much harder problem and involves some significant image processing; I'm not sure if ImageMagick can handle that. If you can figure out the skewing parameters using something else, ImageMagick can definitely unskew it for you.