问题
Is there a way to extract image bytes out of PDImageXObject for different image types without loading them into a BufferedImage? A 15mb TIFF file takes up 200mb in memory when loaded into BufferedImage, which I would love to avoid.
I have found an example for JPG files, but I have no idea what it's doing or if it's possible to do the equivalent for other file types: PNG, GIF, TIFF etc.
// I don't really understand this, but it works for JPEGs
private static final List<String> PDF_JPEG_STOP_FILTERS = Arrays.asList(
COSName.DCT_DECODE.getName(),
COSName.DCT_DECODE_ABBREVIATION.getName());
public void extractImage(PDImageXObject pdImage, OutpuStream baos) {
if ("jpg".equals(pdImage.getSuffix())) {
try (InputStream is = pdImage.createInputStream(PDF_JPEG_STOP_FILTERS)) {
IOUtils.copy(is, baos);
}
} else {
BufferedImage image = pdImage.getImage();
// image.raster.data is huge
ImageIO.write(image, "jpg", baos);
}
}
来源:https://stackoverflow.com/questions/60107248/how-to-extract-image-bytes-out-of-pdf-efficiently