Getting Final HTML with Javascript rendered Java as String

前端 未结 3 896
借酒劲吻你
借酒劲吻你 2020-12-03 04:04

I want to fetch data from an HTML page(scrape it). But it contains reviews in javascript. In normal java url fetch I am only getting the HTML(actual one) without Javascript

相关标签:
3条回答
  • 2020-12-03 04:24

    You can use HTML Unit, A java based "GUI LESS Browser". You can easily get the final rendered output of any page because this loads the page as a web browser do so and returns the final rendered output. You can disable this behaviour though.

    UPDATE: You were asking for example? You don't have to do anything extra for doing that:

    Example:

    WebClient webClient = new WebClient();
    HtmlPage myPage = ((HtmlPage) webClient.getPage(myUrl));
    

    UPDATE 2: You can get iframe as follows:

    HtmlPage myFrame = (HtmlPage) myPage.getFrameByName(myIframeName).getEnclosedPage();
    

    Please read the documentation from above link. There is nothing you can't do about getting page content in HTMLUnit

    0 讨论(0)
  • 2020-12-03 04:32

    Use phantomjs: http://phantomjs.org

    var page = require('webpage').create();
    page.open("http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy-akki-051207.asp")
    setTimeout(function(){
        // Where you want to save it    
        page.render("screenshoot.png")  
        // You can access its content using jQuery
        var fbcomments = page.evaluate(function(){
            return $(".fb-comments iframe").contents().find(".postContainer") 
        }) 
    },10000)
    

    You have to use the option in phantom --web-security=no to allow cross-domain interaction (ie for facebook iframe)

    To communicate with other applications from phantomjs you can use a web server or make a POST request: https://github.com/ariya/phantomjs/blob/master/examples/post.js

    0 讨论(0)
  • 2020-12-03 04:39

    The simple way to solve that problem. Hello, you can use HtmlUnit is java API, i think it can help you to access the executed js content, as a simple html.

    WebClient webClient = new WebClient();
    HtmlPage myPage = (HtmlPage) webClient.getPage(new URL("YourURL"));
    System.out.println(myPage.getVisibleText());
    
    0 讨论(0)
提交回复
热议问题