Splitting a large Pdf file with PDFBox gets large result files

前端 未结 1 1379
[愿得一人]
[愿得一人] 2021-01-15 04:41

I am processing some large pdf files, (up to 100MB and about 2000 pages), with pdfbox. Some of the pages contain a QR code, I want to split those files into smaller ones wit

相关标签:
1条回答
  • 2021-01-15 05:19

    Thx! Tilman you are right, the PDFSplit command generates smaller files. I checked the PDFSplit code out and found that it removes the page links to avoid not needed resources.

    Code extracted from Splitter.class :

    private void processAnnotations(PDPage imported) throws IOException
        {
            List<PDAnnotation> annotations = imported.getAnnotations();
            for (PDAnnotation annotation : annotations)
            {
                if (annotation instanceof PDAnnotationLink)
                {
                    PDAnnotationLink link = (PDAnnotationLink)annotation;   
                    PDDestination destination = link.getDestination();
                    if (destination == null && link.getAction() != null)
                    {
                        PDAction action = link.getAction();
                        if (action instanceof PDActionGoTo)
                        {
                            destination = ((PDActionGoTo)action).getDestination();
                        }
                    }
                    if (destination instanceof PDPageDestination)
                    {
                        // TODO preserve links to pages within the splitted result  
                        ((PDPageDestination) destination).setPage(null);
                    }
                }
                else
                {
                    // TODO preserve links to pages within the splitted result  
                    annotation.setPage(null);
                }
            }
        }
    

    So eventually my code looks like this:

    PDDocument documentoPdf = 
            PDDocument.loadNonSeq(new File("docs_compuestos/50.pdf"), new RandomAccessFile(new File("./tmp/t"), "rw"));
    
            int numPages = documentoPdf.getNumberOfPages();
            List pages = documentoPdf.getDocumentCatalog().getAllPages();
    
    
            int previusQR = 0;
            for(int i =0; i<numPages; i++){
                PDPage firstPage = (PDPage) pages.get(i);
                String qrText ="";
    
    
                BufferedImage firstPageImage = firstPage.convertToImage(BufferedImage.TYPE_USHORT_565_RGB , 200);
    
    
                firstPage =null;
    
                try {
                    qrText = readQRWithQRCodeMultiReader(firstPageImage, hintMap);
                } catch (NotFoundException e) {
                    e.printStackTrace();
                } finally {
                    firstPageImage = null;
                }
    
    
            if(i != 0 && qrText!=null){
                        PDDocument outputDocument = new PDDocument();
                        outputDocument.setDocumentInformation(documentoPdf.getDocumentInformation());
                        outputDocument.getDocumentCatalog().setViewerPreferences(
                                documentoPdf.getDocumentCatalog().getViewerPreferences());
    
    
                        for(int j = previusQR; j<i; j++){
                            PDPage importedPage = outputDocument.importPage((PDPage)pages.get(j));
    
                            importedPage.setCropBox( ((PDPage)pages.get(j)).findCropBox() );
                            importedPage.setMediaBox( ((PDPage)pages.get(j)).findMediaBox() );
                            // only the resources of the page will be copied
                            importedPage.setResources( ((PDPage)pages.get(j)).getResources() );
                            importedPage.setRotation( ((PDPage)pages.get(j)).findRotation() );
    
                            processAnnotations(importedPage);
    
    
                        }
    
    
                        File f = new File("./splitting_files/"+previusQR+".pdf");
    
                        previusQR = i;
    
                        outputDocument.save(f);
                        outputDocument.close();
                    }
                }
    
    
            }
    

    Thank you very much!!

    0 讨论(0)
提交回复
热议问题