split huge 40000 page pdf into single pages, itextsharp, outofmemoryexception

后端未结

关注

 5  791

I am getting huge PDF files with lots of data. The current PDF is 350 MB and has about 40000 pages. It would of course have been nice to get smaller PDFs, but this is what I hav

相关标签:

5条回答

野趣味

2021-01-31 21:56
This is a total shot in the dark, and I haven't tested this code - it's a code extract from the 'iText In Action' book that is given as an example of how to deal with large PDF files. The code is in Java but should be fairly easy to convert -

This is the method that loads everything into memory -
```
PdfReader reader;
long before;
before = getMemoryUse();
reader = new PdfReader(
"HelloWorldToRead.pdf", null);
System.out.println("Memory used by the full read: "
+ (getMemoryUse() - before));
```
This is the memory saving way, where the document should be loaded bit-by-bit as required -
```
before = getMemoryUse();
reader = new PdfReader(
new RandomAccessFileOrArray("HelloWorldToRead.pdf"), null);
System.out.println("Memory used by the partial read: "
+ (getMemoryUse() - before));
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
野性不改

2021-01-31 21:59

PDF Toolkit is quite useful for these types of tasks. Haven't tried it with such a huge file yet though.

0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2021-01-31 22:05
From what I have read, it looks like when instantiating the PdfReader that you should use the constructor that takes in a RandomAccessFileOrArray object. Disclaimer: I have not tried this out myself.
```
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(@"C:\PDFFile.pdf"), null);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2021-01-31 22:05

You might be able to use Ghostscript directly. http://svn.ghostscript.com/ghostscript/tags/ghostscript-9.02/doc/Use.htm#One_page_per_file

For reading the recipient data pdftextstream might be a good choice.

0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2021-01-31 22:08

Could it work better using some other library than itextsharp?

Please try Aspose.Pdf for .NET which allows you to split the PDF into single pages or you could split the PDF to different sets of pages in various ways, either using files or memory streams. API is very simple to learn and use. It works with large PDF files having large number of pages.

Disclosure: I work as developer evangelist at Aspose.

0 讨论(0)
发布评论:

提交评论
- 加载中...