downloading a file that comes as an attachment in a POST request response in PhantomJs

匿名 (未验证) 提交于 2019-12-03 09:14:57

问题:

I want to download a CSV file, it is generated on a button click through a POST request. I researched to my best on casperJs and phantomJS forums and returned empty handed. In a normal browser like firefox, a browser download dialog window appears after the post request. How to handle this case in PhantomJS

TTP/1.1 200 OK Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Encoding: gzip Vary: Accept-Encoding Server: Microsoft-IIS/7.5 Content-disposition: attachment;filename=ExportData.csv X-AspNet-Version: 2.0.50727 X-Powered-By: ASP.NET Date: Fri, 19 Apr 2013 23:26:40 GMT Content-Length: 65183 

回答1:

I've found a way to do this using casperjs (it should work with phantomjs alone if you implement the download function using XMLHttpRequest, but i've not tried).

I'll leave you the working example, that tries to download the mos recent PDF from this page. When you click the download link, some javascript code is triggered that generates some hidden input fields that are then POSTed.

What we do is replace the form's onsubmit function so that it cancels the submission, and get the form destination (action) and all its fields. We use this information later to do the actual download.

var casper=require('casper').create(); casper.start("https://sede.gobcan.es/tributos/jsf/publico/notificaciones/comparecencia/ultimosanuncios.jsp", function() {      var theFormRequest = this.page.evaluate(function() {         var request = {};          var formDom = document.forms["resultadoUltimasNotif"];         formDom.onsubmit = function() {             //iterate the form fields             var data = {};             for(var i = 0; i 

Note: you have to run it with --ignore-ssl-errors, as the CA they use isn't in your browser default CA list.

casperjs --ignore-ssl-errors=true downloadscript.js 


回答2:

You can listen to the page.resource.received event and download() the file when received:

casper.on('page.resource.received', function(resource) {     if (resource.stage !== "end") {         return;     }     if (resource.url.indexOf('ExportData.csv') > -1) {         this.download(resource.url, 'ExportData.csv');     } }); 


回答3:

@julianjm aproach is almost the solution, but in my case i did not have the correct form name to replace the form submission.

So i found another solution using phantomjs beta:

There is a beta version of phantomjs 2.0 that includes an event handler that solves this issue.

It is still a beta version, so there is no debugging.

So i have developed the clicks and the page treatments on the release version and then changed the phantom version to make download work.

 casper.start('http://www.website.com.br/', function() {     this.page.onFileDownload = function(status){console.log('onFileDownload(' + status + ')');   //SYSTEM WILL DETECT THE DOWNLOAD, BUT YOU WILL HAVE TO NAME THE FILE BY YOURSLEF!! return "ContactList_08-25-14.csv"; };      });       casper.then(function() {         //DO YOUR STUFF HERE TO CLICK ON THE DOWNLOAD LINK.        });     casper.run(); 

Download: Phantom 2.0 BETA

Download the exe, rename the release version of phantom.exe to phantom.bkp.exe and insert this 2.0 version on the place. Then, in casperjs you will need to add some lines at the beggining of casperjs/bin/bootstrap.js

 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER  * DEALINGS IN THE SOFTWARE.  *  */ var system = require('system');     var argsdeprecated = system.args;     argsdeprecated.shift();     phantom.args = argsdeprecated; 

also comment the version check (same file):

(function(version) {         // required version check       /*  if (version.major !== 1) {             return __die('CasperJS needs PhantomJS v1.x');         } if (version.minor 

Remember, this is a tweak!!.

So this lines on bootstrap will cause problems if you want to run phantom release version or slimerjs.

So DEVELOP ON RELEASE VERSION, than tweak to this version to be able to download. If you need to debug, you will have to remove the lines of bootstrap.js



回答4:

I have to deal with a site written with some kind of ASP.Net framework which sends a remarkable amount of POST data at each request (some 100 Kb of data, of which about 95 never seem to change between requests - viewport state related apparently).

However, no method I could find worked for me. I've looked into intercepting XHR, I've even found someone who is tackling the very same framework (at least judging from the selectors) but with a simpler case, inspired by this very question. I found out that back in the day this couldn't be done with PhantomJS.

My problem is that a click on a button starts a chain of AJAX requests culminating with the sending of this enormous POST form, to which finally the server replies with a "Content-Disposition: attachment".

In the end, I found this approach which works for me, even if it is network-inefficient:

...setting up everything, until I just need to click on a button...  phantomData    = null; phantomRequest = null;  // Here, I just recognize the form being submitted and copy it.  casper.on('resource.requested', function(requestData, request) {     for (var h in requestData.headers) {         if (requestData.headers[h].name === 'Content-Type') {             if (requestData.headers[h].value === 'application/x-www-form-urlencoded') {                 phantomData         = requestData;                 phantomRequest      = request;             }         }     } });  // Here, I recognize when the request has FAILED because PhantomJS does // not support straight downloading.  casper.on('resource.received', function(resource) {     for (var h in resource.headers) {         if (resource.headers[h].name === 'content-disposition') {             if (resource.stage === 'end') {                 if (phantomData) {                     // to do: get name from resource.headers[h].value                     casper.download(                         resource.url,                         "output.pdf",                         phantomData.method,                         phantomData.postData                     );                 } else {                     // Something went wrong.                 }                 // Possibly, remove listeners?             }         }     } });  // Now, click on the button and initiate the dance. casper.click(pdfLinkSelector); 

The download works flawlessly, even if I can see that the file gets requested (and sent) twice.

[debug] [phantom] Navigation requested: url=https://somesite/SomePage.aspx, type=FormSubmitted, willNavigate=true, isMainFrame=true [debug] [application] GOT FORM, REQUEST DATA SAVED [warning] [phantom] Loading resource failed with status=fail (HTTP 200): https://somesite/SomePage.aspx [debug] [application] END STAGE REACHED, PHANTOMDATA PRESENT [debug] [application] ATTEMPTING CASPERJS.DOWNLOAD [debug] [remote] sendAJAX(): Using HTTP method: 'POST' [debug] [phantom] Downloaded and saved resource in output.pdf [debug] [application] TERMINATING SUCCESSFULLY [debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true [debug] [phantom] url changed to "about:blank" 

(Next, I'll probably modify the script to try invoking request.abort() from inside the resource.requested listener, set a semaphore and invoke again the downloader - I won't be able to get the attachment filename, but that matters little to me).



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!