I am getting huge PDF files with lots of data. The current PDF is 350 MB and has about 40000 pages. It would of course have been nice to get smaller PDFs, but this is what I hav
This is a total shot in the dark, and I haven't tested this code - it's a code extract from the 'iText In Action' book that is given as an example of how to deal with large PDF files. The code is in Java but should be fairly easy to convert -
This is the method that loads everything into memory -
PdfReader reader;
long before;
before = getMemoryUse();
reader = new PdfReader(
"HelloWorldToRead.pdf", null);
System.out.println("Memory used by the full read: "
+ (getMemoryUse() - before));
This is the memory saving way, where the document should be loaded bit-by-bit as required -
before = getMemoryUse();
reader = new PdfReader(
new RandomAccessFileOrArray("HelloWorldToRead.pdf"), null);
System.out.println("Memory used by the partial read: "
+ (getMemoryUse() - before));
PDF Toolkit is quite useful for these types of tasks. Haven't tried it with such a huge file yet though.
From what I have read, it looks like when instantiating the PdfReader that you should use the constructor that takes in a RandomAccessFileOrArray object. Disclaimer: I have not tried this out myself.
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(@"C:\PDFFile.pdf"), null);
You might be able to use Ghostscript directly. http://svn.ghostscript.com/ghostscript/tags/ghostscript-9.02/doc/Use.htm#One_page_per_file
For reading the recipient data pdftextstream might be a good choice.
Could it work better using some other library than itextsharp?
Please try Aspose.Pdf for .NET which allows you to split the PDF into single pages or you could split the PDF to different sets of pages in various ways, either using files or memory streams. API is very simple to learn and use. It works with large PDF files having large number of pages.
Disclosure: I work as developer evangelist at Aspose.