save html output of page after execution of the page's javascript

后端未结

关注

 7  914

There is a site I am trying to scrape, that first loads an html/js modifies the form input fields using js and then POSTs. How can I get the final html output of the POSTed

相关标签:

7条回答

刺人心

2020-11-29 21:29

one approach that comes to my mind, besides using a headless browser is obviously to simulate the ajax calls and to ensemble the page post-process, request by request.. this however is often kind of tricky and should be used as a last resort, unless you really like to dig through javascript code..

0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-11-29 21:30
When I copied your code directly, and changed the URL to www.google.com, it worked fine, with two files saved:
- 1.html
- export.png
Bear in mind that the files will be written to the location you run the script from, not where your .js file is located
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-11-29 21:34
the output code you have is correct, but there is an issue with synchronicity. The output lines that you have are being executed before the page is done loading. You can tie into the onLoadFinished Callback to find out when that happens. See full code below.
```
    var page = new WebPage()
    var fs = require('fs');

    page.onLoadFinished = function() {
      console.log("page load finished");
      page.render('export.png');
      fs.write('1.html', page.content, 'w');
      phantom.exit();
    };

    page.open("http://www.google.com", function() {
      page.evaluate(function() {
      });
    });
```
When using a site like google, it can be deceiving because it loads so quicker, that you can often execute a screengrab inline like you have it. Timing is a tricky thing in phantomjs, sometimes I test with setTimeout to see if timing is an issue.
0 讨论(0)
发布评论:

提交评论
- 加载中...

暖寄归人

2020-11-29 21:36

I'm using CasperJS to run tests with PhantomJS. I added this code to my tearDown function:

var require = patchRequire(require);
var fs = require('fs');

casper.test.begin("My Test", {
    tearDown: function(){
        casper.capture("export.png");
        fs.write("1.html", casper.getHTML(undefined, true), 'w');
    },
    test: function(test){
        // test code

        casper.run(function(){
            test.done();
        });
    }
});

See docs for capture and getHTML.

0 讨论(0)

甜味超标

2020-11-29 21:38

This can easily be done with some php code and javascript use fopen() and fwrite() and this function to save it: var generatedSource = new XMLSerializer().serializeToString(document);

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2020-11-29 21:40

I tried several approaches to similar task and the best results I got using Selenium.

Before I tried PhantomJS and Cheerio. Phantom was crashing too often while executing JS on the page.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页