How can I automate HTML-to-PDF conversions?

后端 未结 14 789
攒了一身酷
攒了一身酷 2020-12-23 19:29

I\'ve been using htmldoc for a while, but I\'ve run into some fairly serious limitations. I need the end solution to work on a Linux box. I\'ll be calling this library/utili

相关标签:
14条回答
  • 2020-12-23 20:05

    You might want to check out 'Document Conversion Service' by Peernet (at http://www.peernet.com/conversion-software/batch-document-converter/). This runs as a service on a Windows Desktop or Windows Server machine. It opens HTML documents in a web browser, then prints them through a print driver to create PDF documents, so that the PDF document produced looks exactly as if you had printed the HTML document from the browser.

    0 讨论(0)
  • 2020-12-23 20:07

    Update 2019-05

    The whole process has thankfully been packed into a docker image by TheCodingMachine: https://github.com/thecodingmachine/gotenberg

    This makes maintenance and usage of chrome based pdf generation in production environments really smooth and hassle free.


    There is a new headless mode since Chrome 59. As all the other solutions really struggle with newer (or not so new anymore) CSS features like flexbox, this was in my case the only solution to produce a proper PDF output.

    To create a pdf from a local html file just use the following command: chrome --headless --disable-gpu --print-to-pdf file:///path/to/myfile.html.

    For Mac OS substitue chrome with /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome.

    The only downside I noticed so far is that (currently) you can not pass the html via stdin, but creating a temporary file is not that much of an issue.

    For more information see https://developers.google.com/web/updates/2017/04/headless-chrome#create_a_pdf_dom

    Update: As it turns out, the chrome guys will most likely provide some kind of node module for this task, which would eventually deprecate the headless mode (https://bugs.chromium.org/p/chromium/issues/detail?id=719921).

    The best bet would be to use the node based approach using the puppeteer module as documented under https://developers.google.com/web/updates/2017/04/headless-chrome#node and print the page via the Page.printToPDF command, which enables some additional configuration, too.

    Of course, you can connect to the debug console websocket from any other environment than node (i.e. PHP script), too.

    0 讨论(0)
  • 2020-12-23 20:09

    An alternative solution that hasn't been answered here is to use an API.

    That advantage of them is that you externalize the resources needed for the job and have an up-to-date service that implements the recent features (no needs to update the code or install bugfixes).

    For instance, with PDFShift, you can do that with a single POST request at:

    POST https://api.pdfshift.io/v2/convert/

    And passing the "source" (either an URL or a raw HTML code), and you'll get back a PDF in binary. (Disclaimer: I work at PDFShift).

    Here's a code sample in Python:

    import requests
    
    response = requests.post(
        'https://api.pdfshift.io/v2/convert/',
        auth=('user_api_key', ''),
        json={"source": "https://en.wikipedia.org/wiki/PDF", "landscape": False, "use_print": False}
    )
    
    response.raise_for_status()
    
    with open('wikipedia.pdf', 'wb') as f:
        f.write(response.content)
    

    And your PDF will be located at ./wikipedia.pdf

    0 讨论(0)
  • 2020-12-23 20:13

    Here is a nice easy-to-install version of headless Chrome:

    https://www.npmjs.com/package/chrome-headless-render-pdf

    Unlike "standard" headless chrome, this does not show the annoying auto-generated headers and footers!

    Or there is unoconv (which uses LibreOffice behind the scenes) can make pdfs from html:

    unoconv -f pdf mypage.html

    You can install it on most Linux flavours via the package manager, e.g. apt-get install unoconv

    That's nice and easy for simple files. If you need javascript of css support, then use headless Chrome.

    0 讨论(0)
  • 2020-12-23 20:15

    I did a bit of googling for you and came up with two options. There may be more, my google strategy was to try "webkit command-line pdf" and "gecko command-line pdf", basically looking for commandline programs that embed the two popular open-source rendering engines in command-line renderers. Here's what I found:

    Firefox command-line printer - outputs to pdf and png

    wkpdf - while this is for mac, it's probably pretty portable.

    0 讨论(0)
  • 2020-12-23 20:17

    You should have a look at http://phantomjs.org/

    Conversion can be done by a small script rasterize.js and then issuing

    phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf
    
    0 讨论(0)
提交回复
热议问题