get a browser rendered html+javascript

前端 未结 2 1757
无人及你
无人及你 2020-12-01 19:40

I need a comandline tool (or Javascript/PHP, but i think commandline is the one way) for render and get the rendered content of URL, but the important its I need to renderer

相关标签:
2条回答
  • 2020-12-01 20:25
    • Selenium : very complete solution with bindings in many languages
    • puppeteer : headless Chrome API, usable in NodeJS or as a command-line tool
    • HTtrack : command-line tool
    • Apache Notch & webmagic : open source Java web crawlers
    • pholcus : "distributed & high concurrency" web crawler written in Go
    • Xvfb a display server implementing the X11 display server protocol, without showing any screen output. I have used it successfully with Travis CI and Protractor as an example. Alternative: XDummy
    • PhantomJS (first suggested by nvuono) : can export the rendered page as non-HTML (pdf, png...). PhantomJS development is suspended until further notice (more details). Closely related: SlimerJS, CasperJS

    And there are many Python web scrapping libraries:

    • Scrapy
    • pyspider
    • ghost.py
    • splinter
    0 讨论(0)
  • 2020-12-01 20:26

    Try phantomjs from www.phantomjs.org and you can easily modify the included rasterize.js to export the rendered HTML. It's based on webkit and does full evaluation of your target site's javascript, allowing you to adjust timeouts or execute your own code first if you wish. I personally use it to save hardcopy HTML file version of fully-rendered knockout.js templates.

    It executes javascript so I just did something like this and saved the console output to a file:

    var markup = page.evaluate(function(){return document.documentElement.innerHTML;});
    console.log(markup);
    phantom.exit();
    
    0 讨论(0)
提交回复
热议问题