Download a Large Number of Files Using the Java SDK for Amazon S3 Bucket

问题

I have a large number of files that need to be downloaded from an S3 bucket. My problem is similar to this article except I am trying to run it in Java.

public static void main(String args[]) {
        AWSCredentials myCredentials = new BasicAWSCredentials("key","secret");
        TransferManager tx = new TransferManager(myCredentials);
        File file = <thefile>
        try{
        MultipleFileDownload myDownload = tx.downloadDirectory("<bucket>", null, file);
        System.out.println("Transfer: " + myDownload.getDescription());
        System.out.println("  - State: " + myDownload.getState());
        System.out.println("  - Progress: " + myDownload.getProgress().getBytesTransfered());

        while (myDownload.isDone() == false) {
           System.out.println("Transfer: " + myDownload.getDescription());
           System.out.println("  - State: " + myDownload.getState());
            System.out.println("  - Progress: " + myDownload.getProgress().getBytesTransfered());
            try {
                // Do work while we wait for our upload to complete...
                Thread.sleep(500);
            } catch (InterruptedException ex) {
                ex.printStackTrace();
            }
         }
         } catch(Exception e){
          e.printStackTrace();
         }

      }

This was adapted from the TransferManager class example for multiple upload. There are well over a 100,000 objects in this bucket. Any help would be great.

回答1:

Please use the list() method to get a list of your files, then use the get() method to get each file.

class S3 extends AmazonS3Client {

    final String bucket;


    S3(String u, String p, String Bucket) {
        super(new BasicAWSCredentials(u, p));
        bucket = Bucket;
    }


    String get(String k) {
        try {
            final S3Object f = getObject(bucket, k);
            final BufferedInputStream i = new BufferedInputStream(f.getObjectContent());
            final StringBuilder s = new StringBuilder();
            final byte[] b = new byte[1024];
            for (int n = i.read(b); n != -1; n = i.read(b)) {
                s.append(new String(b, 0, n));
            }
            return s.toString();
        } catch (Exception e) {
            log("Cannot get " + bucket + "/" + k + " from S3 because " + e);
        }
        return null;
    }


    String[] list(String d) {
        try {
            final ObjectListing l = listObjects(bucket, d);
            final List<S3ObjectSummary> L = l.getObjectSummaries();
            final int n = L.size();
            final String[] s = new String[n];
            for (int i = 0; i < n; ++i) {
                final S3ObjectSummary k = L.get(i);
                s[i] = k.getKey();
            }
            return s;
        } catch (Exception e) {
            log("Cannot list " + bucket + "/" + d + " on S3 because " + e);
        }
        return new String[]{};
    }
}

回答2:

TransferManager internally uses countdownlatch which makes me believe is does concurrent download (which seems the right way to do it). It makes sense to use it than get one file after other sequentially?

来源：https://stackoverflow.com/questions/14539475/download-a-large-number-of-files-using-the-java-sdk-for-amazon-s3-bucket

标签

java

amazon-web-services

amazon-s3

download

amazon