I\'ve been using htmldoc for a while, but I\'ve run into some fairly serious limitations. I need the end solution to work on a Linux box. I\'ll be calling this library/utili
You might want to check out 'Document Conversion Service' by Peernet (at http://www.peernet.com/conversion-software/batch-document-converter/). This runs as a service on a Windows Desktop or Windows Server machine. It opens HTML documents in a web browser, then prints them through a print driver to create PDF documents, so that the PDF document produced looks exactly as if you had printed the HTML document from the browser.
Update 2019-05
The whole process has thankfully been packed into a docker image by TheCodingMachine: https://github.com/thecodingmachine/gotenberg
This makes maintenance and usage of chrome based pdf generation in production environments really smooth and hassle free.
There is a new headless mode since Chrome 59. As all the other solutions really struggle with newer (or not so new anymore) CSS features like flexbox, this was in my case the only solution to produce a proper PDF output.
To create a pdf from a local html file just use the following command:
chrome --headless --disable-gpu --print-to-pdf file:///path/to/myfile.html
.
For Mac OS substitue chrome
with /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
.
The only downside I noticed so far is that (currently) you can not pass the html via stdin, but creating a temporary file is not that much of an issue.
For more information see https://developers.google.com/web/updates/2017/04/headless-chrome#create_a_pdf_dom
Update: As it turns out, the chrome guys will most likely provide some kind of node module for this task, which would eventually deprecate the headless mode (https://bugs.chromium.org/p/chromium/issues/detail?id=719921).
The best bet would be to use the node based approach using the puppeteer module as documented under https://developers.google.com/web/updates/2017/04/headless-chrome#node and print the page via the Page.printToPDF command, which enables some additional configuration, too.
Of course, you can connect to the debug console websocket from any other environment than node (i.e. PHP script), too.
An alternative solution that hasn't been answered here is to use an API.
That advantage of them is that you externalize the resources needed for the job and have an up-to-date service that implements the recent features (no needs to update the code or install bugfixes).
For instance, with PDFShift, you can do that with a single POST request at:
POST https://api.pdfshift.io/v2/convert/
And passing the "source"
(either an URL or a raw HTML code), and you'll get back a PDF in binary. (Disclaimer: I work at PDFShift).
Here's a code sample in Python:
import requests
response = requests.post(
'https://api.pdfshift.io/v2/convert/',
auth=('user_api_key', ''),
json={"source": "https://en.wikipedia.org/wiki/PDF", "landscape": False, "use_print": False}
)
response.raise_for_status()
with open('wikipedia.pdf', 'wb') as f:
f.write(response.content)
And your PDF will be located at ./wikipedia.pdf
Here is a nice easy-to-install version of headless Chrome:
https://www.npmjs.com/package/chrome-headless-render-pdf
Unlike "standard" headless chrome, this does not show the annoying auto-generated headers and footers!
Or there is unoconv
(which uses LibreOffice behind the scenes) can make pdfs from html:
unoconv -f pdf mypage.html
You can install it on most Linux flavours via the package manager, e.g. apt-get install unoconv
That's nice and easy for simple files. If you need javascript of css support, then use headless Chrome.
I did a bit of googling for you and came up with two options. There may be more, my google strategy was to try "webkit command-line pdf" and "gecko command-line pdf", basically looking for commandline programs that embed the two popular open-source rendering engines in command-line renderers. Here's what I found:
Firefox command-line printer - outputs to pdf and png
wkpdf - while this is for mac, it's probably pretty portable.
You should have a look at http://phantomjs.org/
Conversion can be done by a small script rasterize.js and then issuing
phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf