问题
I am trying to stream large content (say 200 MB) of formatted data to PDF with minimal memory footprint (say 20 MB per Client/Thread). The PDF structure is written in Adobe postscript and it is complex to directly write in PDF postscript format. I have been using the following APIs to stream content to PDF.
- Jasper Reports
- iText
The problem I am facing with Jasper reports is that it needs all the input data to be taken into in-memory and only supports OutputStream. There is a function supporting InputStream of data in Jasper Reports but in the backend Jasper loads the whole of the InputStream data into the memory and effectively exhausting the memory.
The problem with iText is that it is commercial. I am now looking to write my own Java API to stream formatted data including tables, images to PDF directly. I have referred the following books to understand the PDF structure:
- Pdf Structure by Adobe
- Pdf Explained (O'REILLY)
The above books cover only the basic PDF formatting such as Text and 2D Graphics. How do I draw tables, icons, and all other formatting that I am able to generate with HTML/CSS into the PDF?
I need some pointers on understanding the PDF structure in depth. Or, is there already a Java API which supports direct streaming of input content to PDF without holding the entire data in memory?
Note: Headless browsers (PhantomJS, wkhtmltopdf), Apache FOP, Apache PdfBox renders PDF by loading the entire data in memory.
来源:https://stackoverflow.com/questions/51491522/how-to-directly-stream-large-content-to-pdf-with-minimal-memory-footprint