split huge 40000 page pdf into single pages, itextsharp, outofmemoryexception

后端 未结 5 791
清酒与你
清酒与你 2021-01-31 21:40

I am getting huge PDF files with lots of data. The current PDF is 350 MB and has about 40000 pages. It would of course have been nice to get smaller PDFs, but this is what I hav

相关标签:
5条回答
  • 2021-01-31 21:56

    This is a total shot in the dark, and I haven't tested this code - it's a code extract from the 'iText In Action' book that is given as an example of how to deal with large PDF files. The code is in Java but should be fairly easy to convert -

    This is the method that loads everything into memory -

    PdfReader reader;
    long before;
    before = getMemoryUse();
    reader = new PdfReader(
    "HelloWorldToRead.pdf", null);
    System.out.println("Memory used by the full read: "
    + (getMemoryUse() - before));
    

    This is the memory saving way, where the document should be loaded bit-by-bit as required -

    before = getMemoryUse();
    reader = new PdfReader(
    new RandomAccessFileOrArray("HelloWorldToRead.pdf"), null);
    System.out.println("Memory used by the partial read: "
    + (getMemoryUse() - before));
    
    0 讨论(0)
  • 2021-01-31 21:59

    PDF Toolkit is quite useful for these types of tasks. Haven't tried it with such a huge file yet though.

    0 讨论(0)
  • 2021-01-31 22:05

    From what I have read, it looks like when instantiating the PdfReader that you should use the constructor that takes in a RandomAccessFileOrArray object. Disclaimer: I have not tried this out myself.

    iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(@"C:\PDFFile.pdf"), null);
    
    0 讨论(0)
  • 2021-01-31 22:05

    You might be able to use Ghostscript directly. http://svn.ghostscript.com/ghostscript/tags/ghostscript-9.02/doc/Use.htm#One_page_per_file

    For reading the recipient data pdftextstream might be a good choice.

    0 讨论(0)
  • 2021-01-31 22:08

    Could it work better using some other library than itextsharp?

    Please try Aspose.Pdf for .NET which allows you to split the PDF into single pages or you could split the PDF to different sets of pages in various ways, either using files or memory streams. API is very simple to learn and use. It works with large PDF files having large number of pages.

    Disclosure: I work as developer evangelist at Aspose.

    0 讨论(0)
提交回复
热议问题