Get HTML with current styles (maybe inlined) of a page that finished rendering and finished running scripts

前端 未结 1 815
半阙折子戏
半阙折子戏 2021-02-08 22:33

I need to get the HTML with current styles (maybe inlined) of a page that finished rendering and finished running scripts, using a server side application which will be given ju

1条回答
  •  一整个雨季
    2021-02-08 23:38

    PhantomJS is a headless (GUI-less) WebKit with JavaScript API. It runs on all major platforms, as I requested in my question.

    It can run Javascript scripts to control the GUI-less web browser. It has a powerful API, and lots and lots of examples.

    In my spare time over the last 2-3 days I wrote the solution to my question, and it covers all requirements beautifully. I haven't found a webpage for which it wouldn't work.

    .

    Usage, command line:

    phantomjs save_as_html.js http://stackoverflow.com/q/12215844/584490 saved.html
    

    .

    Javascript is allowed to run for n seconds after everything else loads, it should work even for web pages generated entirely by javascript.

    .

    Notes:

    • Where possible, XHR loading of resources is prefered over HTML5's canvas rendering because of reduced file size and preventing quality loss (reusing original files is better than anything).

    • and tags are kept in place, and data: URIs are used inside the href and src attributes respectively, instead of URLs. The same is true for background-image, which is read using getComputedStyle() on all DOM nodes.

提交回复
热议问题