Taking reliable screenshots of websites? Phantomjs and Casperjs both return empty screen shots on some websites

后端 未结 1 1725
一向
一向 2020-12-23 17:26

Open a web page and take a screenshot.

Using ONLY phantomjs: (this is a simple script, in fact it is the example script used in their docs. http://phantomjs.org/scree

相关标签:
1条回答
  • 2020-12-23 18:19

    After bouncing this around for some time I was able to narrow down the problem. Apparently PhantomJS uses a default ssl of sslv3 which causes github to refuse the connection due to a bad ssl handshake

    phantomjs --debug=true github.js
    

    Shows output of:

    . . .
    2014-10-22T19:48:31 [DEBUG] WebPage - updateLoadingProgress: 10 
    2014-10-22T19:48:32 [DEBUG] Network - Resource request error: 6 ( "SSL handshake failed" ) URL: "https://github.com/" 
    2014-10-22T19:48:32 [DEBUG] WebPage - updateLoadingProgress: 100 
    

    So from this we can conclude that no screen was taken because github was refusing the connection. Great that makes perfect sense. So let's set SSL flag to --ssl-protocol=any and lets also ignore ssl-errors with --ignore-ssl-errors=true

    phantomjs --ignore-ssl-errors=true --ssl-protocol=any --debug=true github.js
    

    Great success! A screenshot is now being rendered and saved properly but debugger is showing us a TypeError:

    TypeError: 'undefined' is not a function (evaluating 'Array.prototype.forEach.call.bind(Array.prototype.forEach)')
    
      https://assets-cdn.github.com/assets/frameworks-dabc650f8a51dffd1d4376a3522cbda5536e4807e01d2a86ff7e60d8d6ee3029.js:29
      https://assets-cdn.github.com/assets/frameworks-dabc650f8a51dffd1d4376a3522cbda5536e4807e01d2a86ff7e60d8d6ee3029.js:29
    2014-10-22T19:52:32 [DEBUG] WebPage - updateLoadingProgress: 72 
    2014-10-22T19:52:32 [DEBUG] WebPage - updateLoadingProgress: 88 
    ReferenceError: Can't find variable: $
    
      https://assets-cdn.github.com/assets/github-fa2f009761e3bc4750ed00845b9717b09646361cbbc3fa473ad64de9ca6ccf5b.js:1
      https://assets-cdn.github.com/assets/github-fa2f009761e3bc4750ed00845b9717b09646361cbbc3fa473ad64de9ca6ccf5b.js:1
    

    I checked the github homepage manually just to see if a TypeError existed and it does NOT.

    My next guess is that the assets aren't loading quick enough.. Phantomjs is faster than a speeding bullet!

    So lets try to slow it down artificially and see if we can get rid of that TypeError...

    var page = require('webpage').create();
    page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36';
    page.open('http://github.com', function (status) {
       window.setTimeout(function () {
                page.render('github.png');
                phantom.exit();
            }, 3000);
    });
    

    That didn't work... After a closer inspection of the image - it is clear that some elements are missing. Mainly some icons and the logo.

    Success? Partially because we are now at least getting a screen shot where earlier, we weren't getting a thing.

    Job done? Not exactly. Need to determine what is causing that TypeError because it preventing some assets from loading and distorting the image.

    Additional

    Attempted to recreate with CasperJS --debug is very ugly and hard to follow compared to PhantomJS:

    casper.start();
    casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X)');
    casper.thenOpen('https://www.github.com/', function() {
        this.captureSelector('github.png', 'body');
    });
    
    casper.run();
    

    console:

    casperjs test --ssl-protocol=any --debug=true github.js
    

    Further the image is missing the same icons but is also visually distorted. Being that CasperJs relies on Phantomjs, I do not see the value in using it for this specific task.

    If you would like to add to my answer, please share your findings. Very interested in a flawless PhantomJS solution

    Update #1 : Removing the TypeError

    @ArtjomB points out that Phantomjs does not support js bind in it's current version as of this update (1.9.7). For this reason he explains: ArtjomB: PhantomJs Bind Issue Answer

    The TypeError: 'undefined' is not a function refers to bind, because PhantomJS 1.x doesn't support it. PhantomJS 1.x uses an old fork of QtWebkit which is comparable to Chrome 13 or Safari 5. The newer PhantomJS 2 will use a newer engine which will support bind. For now you need to add a shim inside of the page.onInitialized event handler:

    Ok great, so the following code will take care of our TypeError from above. (But not fully functional, see below for details)

    var page = require('webpage').create();
    page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36';
    page.open('http://github.com', function (status) {
       window.setTimeout(function () {
                page.render('github.png');
                phantom.exit();
            }, 5000);
    });
    page.onInitialized = function(){
        page.evaluate(function(){
            var isFunction = function(o) {
              return typeof o == 'function';
            };
    
            var bind,
              slice = [].slice,
              proto = Function.prototype,
              featureMap;
    
            featureMap = {
              'function-bind': 'bind'
            };
    
            function has(feature) {
              var prop = featureMap[feature];
              return isFunction(proto[prop]);
            }
    
            // check for missing features
            if (!has('function-bind')) {
              // adapted from Mozilla Developer Network example at
              // https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Function/bind
              bind = function bind(obj) {
                var args = slice.call(arguments, 1),
                  self = this,
                  nop = function() {
                  },
                  bound = function() {
                    return self.apply(this instanceof nop ? this : (obj || {}), args.concat(slice.call(arguments)));
                  };
                nop.prototype = this.prototype || {}; // Firefox cries sometimes if prototype is undefined
                bound.prototype = new nop();
                return bound;
              };
              proto.bind = bind;
            }
        });
    }
    

    Now the above code will get us a screenshot same as we were getting before AND debug will not show a TypeError so from the surface, everything appears to work. Progress has been made.

    Unfortunately, all of the image icons [logo, etc] are still not loading correctly. We see some sort of 3W icon not sure where thats from.

    Thanks for the help @ArtjomB

    enter image description here

    0 讨论(0)
提交回复
热议问题