I wanted to convert PDF document into image. I was using Ghost4j.
Problem: Ghost4J needs gsdll32.dll file at runtime, and I do not
The way over PDFBox is a good way to avoid native bindings. Try to use the PDFImageWriter from the PDFBox, i did the same with it in a few lines and it worked perfectly. You have to extract the PDFDocument and use the writer with it.
PDFImageWriter.write(doc, "png", null, , Integer.MAX_VALUE, "picture");
For all pages.
PDFImageWriter.write(doc, "png", null, 0, 0, "picture");
See: PDFImageWriter Javadoc
Probably you have try to convert corrupted PDF file. I've the same errors when the PDF file contains JPXEncoded streams.
You can easily convert PDF into image using PDFBox. renderImageWithDPI method of PDFRenderer class of PDFBox is used to convert pdf to image.
PDDocument doc=PDDocument.load(new File("filepath/sample.pdf"));
PDFRenderer pdfRenderer = new PDFRenderer(doc);
BufferedImage bffim = pdfRenderer.renderImageWithDPI(pageNo, 300, ImageType.RGB);
String fileName = "image-" + page + ".png";
ImageIOUtil.writeImage(bim, fileName, 300);
You can try to use NonSequentialParser to avoid errors with some PDF files (with incremental updates):
PDDocument doc = PDDocument.loadNonSeq(new File("/document.pdf"));
try {
PDDocument document = PDDocument.load(PdfInfo.getPDFWAY());
if (document.isEncrypted()) {
document.decrypt(PdfInfo.getPASSWORD());
}
if ("bilevel".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE( BufferedImage.TYPE_BYTE_BINARY);
} else if ("indexed".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_INDEXED);
} else if ("gray".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_GRAY);
} else if ("rgb".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_RGB);
} else if ("rgba".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_ARGB);
} else {
System.exit(2);
}
PDFImageWriter imageWriter = new PDFImageWriter();
boolean success = imageWriter.writeImage(document, PdfInfo.getIMAGE_FORMAT(),PdfInfo.getPASSWORD(),
PdfInfo.getSTART_PAGE(),PdfInfo.getEND_PAGE(),PdfInfo.getOUTPUT_PREFIX(),PdfInfo.getIMAGETYPE(),PdfInfo.getRESOLUTION());
if (!success) {
System.exit(1);
}
document.close();
} catch (IOException | CryptographyException | InvalidPasswordException ex) {
Logger.getLogger(PdfToImae.class.getName()).log(Level.SEVERE, null, ex);
}
public class PdfInfo {
private static String PDFWAY;
private static String OUTPUT_PREFIX;
private static String PASSWORD;
private static int START_PAGE=1;
private static int END_PAGE=Integer.MAX_VALUE;
private static String IMAGE_FORMAT="jpg";
private static String COLOR="rgb";
private static int RESOLUTION=256;
private static int IMAGETYPE=24;
private static String filename;
private static String filePath="";
}
For the error:
org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation
You need to include fontbox-1.7.1 jar in the class path apart from Apache pdfbox jar which will fix your issue as PDFBox internally uses fontbox-1.7.1