Using Phantom.js evaluate, how can I get the HTML of the page?

两盒软妹~` 提交于 2020-01-21 09:00:52

问题


page.evaluate(function() { return document; }, function(result){    
    console.log(result)                    
    next();
});

result is actually a huge object. I don't know the properties and attributes of that object. I just want the HTML of the page as you would see it in Chrome inspector.

From the look of the object, it seems that the HTML includes CSS and javascript..which is weird. The user should not see the CSS and javascript, because they are not the web page's HTML. Those are external files. I only want the HTML that the user would see.


回答1:


The type of document is an HTML document. To get the entire DOM as a string, you could do document.documentElement.outerHTML.

From outside evaluate, you can use page.content. It is a string.

I don't know what you mean by "HTML includes CSS and JavaScript" or "the web page's HTML". Are you referring to the difference between the page source and the DOM as modified by scripting? Both the above give you the current DOM, not the original page source.



来源:https://stackoverflow.com/questions/16706777/using-phantom-js-evaluate-how-can-i-get-the-html-of-the-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!