How to save a .pdf from a browser?

二次信任 提交于 2019-12-01 11:55:48

From the errorneous response which appears to be just a HTML error page:

alert('Session timed out. Please login again.\n');

It thus appears that downloading the PDF file is required to take place in a valid HTTP session. The HTTP session is backed by a cookie. The HTTP session in turn contains in the server side usually information about the currenty active and/or logged-in user.

The Selenium web driver manages cookies all by itself fully transparently. You can obtain them programmatically as follows:

Set<Cookie> cookies = driver.manage().getCookies();

When manually fiddling with java.net.URL outside control of Selenium, you should be making sure yourself that the URL connection is using the same cookies (and thus also maintaining the same HTTP session). You can set cookies on the URL connection as follows:

URLConnection connection = new URL(driver.getCurrentUrl()).openConnection();

for (Cookie cookie : driver.manage().getCookies()) {
    String cookieHeader = cookie.getName() + "=" + cookie.getValue();
    connection.addRequestProperty("Cookie", cookieHeader);
}

InputStream input = connection.getInputStream(); // Write this to file.
ddavison

A PDF is considered a Binary File and it gets corrupted because the way that copyUrlToFile() works. By the way, this looks like a duplicate of JAVA - Download Binary File (e.g. PDF) file from Webserver

Try this custom binary download method out -

public void downloadBinaryFile(String path) {
    URL u = new URL(path);
    URLConnection uc = u.openConnection();
    String contentType = uc.getContentType();
    int contentLength = uc.getContentLength();
    if (contentType.startsWith("text/") || contentLength == -1) {
      throw new IOException("This is not a binary file.");
    }
    InputStream raw = uc.getInputStream();
    InputStream in = new BufferedInputStream(raw);
    byte[] data = new byte[contentLength];
    int bytesRead = 0;
    int offset = 0;
    while (offset < contentLength) {
      bytesRead = in.read(data, offset, data.length - offset);
      if (bytesRead == -1)
        break;
      offset += bytesRead;
    }
    in.close();

    if (offset != contentLength) {
      throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes");
    }

    String filename = u.getFile().substring(filename.lastIndexOf('/') + 1);
    FileOutputStream out = new FileOutputStream(filename);
    out.write(data);
    out.flush();
    out.close();
}

EDIT: It actually sounds as if you are not on the page that you think you are.. instead of doing driver.getCurrentUrl()

Have your script take the Url from the link to the PDF. Assuming there is a link like <a href='http://mysite.com/my.pdf' /> Instead of clicking it, then getting the url, just take the href from that link, and download it.

String pdfPath = driver.findElement(By.id("someId")).getAttribute("href");
downloadBinaryFile(pdfPath);
sbridges

The server may be compressing the pdf. You can use this code, stolen from this answer to detect and decompress the response from the server,

InputStream is = driver.getCurrentUrl().openStream();
try {
   InputStream decoded = decompressStream(is);
   FileOutputStream output = new FileOutputStream(
       new File("C:\\Users\\myDocs\\myfolder\\myFile.pdf"));
   try {
       IOUtils.copy(decoded, output);
   }
   finally {
       output.close();
   }
} finally {
   is.close();
}

public static InputStream decompressStream(InputStream input) {
     PushBackInputStream pb = new PushBackInputStream( input, 2 ); //we need a pushbackstream to look ahead
     byte [] signature = new byte[2];
     pb.read( signature ); //read the signature
     pb.unread( signature ); //push back the signature to the stream
     if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip maguc number
       return new GZIPInputStream( pb );
     else 
       return pb;
}
Petr Janeček

When I try to save the file manually using CTRL+Shift+S , the file gets saved OK.

While I advocate using Java to download the file, there is a not-so-recommended workaround that presses Ctrl+Shift+S programatically: The Robot class.

It sucks to use a workaround, but it works reliably as far as I can tell in the browsers and OSes I tried. This code should not get into any serious application. But it's OK for tests if you won't be able to solve your issue the right way.

Robot robot = new Robot();

Press Ctrl+Shift+S

robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_SHIFT);
robot.keyPress(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_SHIFT);
robot.keyRelease(KeyEvent.VK_CONTROL);

In browsers and OSes I know, you should be in the Save file dialogue in the File name input. You can type in your absolute path:

robot.keyPress(KeyEvent.VK_C);        // C
robot.keyRelease(KeyEvent.VK_C);
robot.keyPress(KeyEvent.VK_COLON);    // : (colon)
robot.keyRelease(KeyEvent.VK_COLON);
robot.keyPress(KeyEvent.VK_SLASH);    // / (slash)
robot.keyRelease(KeyEvent.VK_SLASH);
// etc. for the whole file path

robot.keyPress(KeyEvent.VK_ENTER);    // confirm by pressing Enter in the end
robot.keyRelease(KeyEvent.VK_ENTER);

To get the keycodes, you can use KeyEvent#getExtendedKeyCodeForChar() (Java 7+ only), or How can I make Robot type a `:`? and Convert String to KeyEvents.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!