I am processing some large pdf files, (up to 100MB and about 2000 pages), with pdfbox. Some of the pages contain a QR code, I want to split those files into smaller ones wit
Thx! Tilman you are right, the PDFSplit command generates smaller files. I checked the PDFSplit code out and found that it removes the page links to avoid not needed resources.
Code extracted from Splitter.class :
private void processAnnotations(PDPage imported) throws IOException
{
List annotations = imported.getAnnotations();
for (PDAnnotation annotation : annotations)
{
if (annotation instanceof PDAnnotationLink)
{
PDAnnotationLink link = (PDAnnotationLink)annotation;
PDDestination destination = link.getDestination();
if (destination == null && link.getAction() != null)
{
PDAction action = link.getAction();
if (action instanceof PDActionGoTo)
{
destination = ((PDActionGoTo)action).getDestination();
}
}
if (destination instanceof PDPageDestination)
{
// TODO preserve links to pages within the splitted result
((PDPageDestination) destination).setPage(null);
}
}
else
{
// TODO preserve links to pages within the splitted result
annotation.setPage(null);
}
}
}
So eventually my code looks like this:
PDDocument documentoPdf =
PDDocument.loadNonSeq(new File("docs_compuestos/50.pdf"), new RandomAccessFile(new File("./tmp/t"), "rw"));
int numPages = documentoPdf.getNumberOfPages();
List pages = documentoPdf.getDocumentCatalog().getAllPages();
int previusQR = 0;
for(int i =0; i
Thank you very much!!