Text is missing when converting pdf file into image in java using pdfbox

一世执手 提交于 2019-12-23 12:46:26

问题


I want to convert a PDF page to image file. Text is missing when I convert a PDF page to image using java.

The file which I want to convert 46_2.pdf after converting it shown me like 46_2.png

Code:

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

public class ConvertPDFPageToImageWithoutText {
    public static void main(String[] args) {
        try {
            String oldPath = "C:/PDFCopy/46_2.pdf";
            File oldFile = new File(oldPath);
           if (oldFile.exists()) {

            PDDocument document = PDDocument.load(oldPath);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();

            for (PDPage page : list) {
                BufferedImage image = page.convertToImage();
                File outputfile = new File("C:/PDFCopy/image.png");
                ImageIO.write(image, "png", outputfile);
                document.close();
            }

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

回答1:


Since you're using PDFBox, try using PDFImageWriter.writeToImage instead of PDPage.convertToImage. This post seems relevant to what you are trying to do.




回答2:


I had the same problem. I found an article(unfortunally can't remember where because I've read hundred of them). There an author complained that appeared such problems in PDFBox after they updated the Java version to 7.21. So I'm using 7.17 and it works for me:)




回答3:


Use the latest version of PDFBox(I am using 2.0.9) and add JAI Image I/O dependency from here. This is sample running code on JAVA 7.

    public void pdfToImageConvertorUsingPdfBox(String inputPdfPath) throws Exception {
    File sourceFile = new File(inputPdfPath);
    String formatName = "png";
    if (sourceFile.exists()) {
        PDDocument document = PDDocument.load(sourceFile);
        PDFRenderer pdfRenderer = new PDFRenderer(document);
        int count = document.getNumberOfPages();

        for (int i = 0; i < count; i++) {
            BufferedImage image = pdfRenderer.renderImageWithDPI(i, 200, ImageType.RGB);
            String output = FilenameUtils.removeExtension(inputPdfPath) + "_" + (i + 1) + "." + formatName;
            ImageIO.write(image, formatName, new File(output));
        }
        document.close();
    } else {
        logger.error(sourceFile.getName() + " File not exists");
    }
}


来源:https://stackoverflow.com/questions/21059403/text-is-missing-when-converting-pdf-file-into-image-in-java-using-pdfbox

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!