How to directly stream large content to PDF with minimal memory footprint?

我的未来我决定 提交于 2019-12-22 09:30:52

问题


I am trying to stream large content (say 200 MB) of formatted data to PDF with minimal memory footprint (say 20 MB per Client/Thread). The PDF structure is written in Adobe postscript and it is complex to directly write in PDF postscript format. I have been using the following APIs to stream content to PDF.

  • Jasper Reports
  • iText

The problem I am facing with Jasper reports is that it needs all the input data to be taken into in-memory and only supports OutputStream. There is a function supporting InputStream of data in Jasper Reports but in the backend Jasper loads the whole of the InputStream data into the memory and effectively exhausting the memory.

The problem with iText is that it is commercial. I am now looking to write my own Java API to stream formatted data including tables, images to PDF directly. I have referred the following books to understand the PDF structure:

  • Pdf Structure by Adobe
  • Pdf Explained (O'REILLY)

The above books cover only the basic PDF formatting such as Text and 2D Graphics. How do I draw tables, icons, and all other formatting that I am able to generate with HTML/CSS into the PDF?

I need some pointers on understanding the PDF structure in depth. Or, is there already a Java API which supports direct streaming of input content to PDF without holding the entire data in memory?

Note: Headless browsers (PhantomJS, wkhtmltopdf), Apache FOP, Apache PdfBox renders PDF by loading the entire data in memory.

来源:https://stackoverflow.com/questions/51491522/how-to-directly-stream-large-content-to-pdf-with-minimal-memory-footprint

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!