aws s3 java sdk download pdf getting corrupted

I am downloading files from aws s3 using the getObject api. Simple text files work fine, but on pdf download my file is corrupted. I am using FileOutputStream and saving contents in a file, but the pdf saved is getting corrupted.

I am not quite sure about the correct java api to use for this purpose and what should be the size of the byte array where the bytes read get written.

I am also curious if using the SDK directly makes sense, or is there are open source wrapper api's available in Java that I could be leveraging.

FileOutputStream fout = new FileOutputStream(new File(destFileName));

 byte[] b = new byte[8192];
 int bytesRead;
    while (true) {
     bytesRead = input.read(b);
        System.out.println("bytesRead = "+bytesRead );
        if (bytesRead==-1) 
         break;
        fout.write(b);
    }        
    fout.flush();
    fout.close();

To be honest with you, I'm willing to bet the problem is that you write the entire buffer to the FileOutputStream. At the end of the transmission, the buffer won't be completely full/overwritten and you will end up writing some bytes to the end of the file that were left over from the last read. You need to modify this code to only write the number of bytes that are actually read from the input stream, rather than the entire buffer.

Instead of

fout.write(b);

Try

fout.write(b, 0, bytesRead);

This way, if you only read 100 bytes during the last read, you only write the first 100 bytes of the buffer and ignore the remaining 8092 bytes that were actually already written to the file.

来源：https://stackoverflow.com/questions/5616014/aws-s3-java-sdk-download-pdf-getting-corrupted

标签

java

pdf

file-io

amazon-s3

fileoutputstream