Upload ZipOutputStream to S3 without saving zip file (large) temporary to disk using AWS S3 Java

给你一囗甜甜゛ 提交于 2019-12-04 20:09:37

问题


I have a requirement to download photos (not in same directory) from S3, ZIP them and again upload to S3 using AWS S3 Java SDK. This zip file size can go in GBs. Currently I am using AWS Lambda which has a limitation of temporary storage up to 500 MB. So I don't want to save ZIP file on disk instead I want to stream ZIP file (which is being created dynamically using downloaded photos from S3) directly to S3. I need this using AWS S3 Java SDK.


回答1:


The basic idea is to use streaming operations. This way you won't wait till the ZIP is generated on a filesystem, but start uploading as soon, as the ZIP algorithm produces any data. Obviously, some data will be buffered in memory, still no need to wait for the whole ZIP to be generated on a disk. We'll also use stream compositions and PipedInputStream / PipedOutputStream in two threads: one to read the data, and the other to ZIP the contents.

Here is a version for aws-java-sdk:

final AmazonS3 client = AmazonS3ClientBuilder.defaultClient();

final PipedOutputStream pipedOutputStream = new PipedOutputStream();
final PipedInputStream pipedInputStream = new PipedInputStream(pipedOutputStream);

final Thread s3In = new Thread(() -> {
    try (final ZipOutputStream zipOutputStream = new ZipOutputStream(pipedOutputStream)) {
        S3Objects
                // It's just a convenient way to list all the objects. Replace with you own logic.
                .inBucket(client, "bucket")
                .forEach((S3ObjectSummary objectSummary) -> {
                    try {
                        if (objectSummary.getKey().endsWith(".png")) {
                            System.out.println("Processing " + objectSummary.getKey());

                            final ZipEntry entry = new ZipEntry(
                                    UUID.randomUUID().toString() + ".png" // I'm too lazy to extract file name from the
                                    // objectSummary
                            );

                            zipOutputStream.putNextEntry(entry);

                            IOUtils.copy(
                                    client.getObject(
                                            objectSummary.getBucketName(),
                                            objectSummary.getKey()
                                    ).getObjectContent(),
                                    zipOutputStream
                            );

                            zipOutputStream.closeEntry();
                        }
                    } catch (final Exception all) {
                        all.printStackTrace();
                    }
                });
    } catch (final Exception all) {
        all.printStackTrace();
    }
});
final Thread s3Out = new Thread(() -> {
    try {
        client.putObject(
                "another-bucket",
                "previews.zip",
                pipedInputStream,
                new ObjectMetadata()
        );

        pipedInputStream.close();
    } catch (final Exception all) {
        all.printStackTrace();
    }
});

s3In.start();
s3Out.start();

s3In.join();
s3Out.join();

However, note that it will print a warning:

WARNING: No content length specified for stream data.  Stream contents will be buffered in memory and could result in out of memory errors.

That's because S3 needs to know the size of data in advance, before the upload. It's impossible to know the size of a resulting ZIP in advance. You can probably try your luck with multipart uploads, but the code will be more trickier. Though, the idea would be similar: one thread should read the data and send the content in ZIP stream and the other thread should read ZIPped entries and upload them as multiparts. After all the entries (parts) are uploaded, the multipart should be completed.

Here is an example for aws-java-sdk-2.x:

final S3Client client = S3Client.create();

final PipedOutputStream pipedOutputStream = new PipedOutputStream();
final PipedInputStream pipedInputStream = new PipedInputStream(pipedOutputStream);

final Thread s3In = new Thread(() -> {
    try (final ZipOutputStream zipOutputStream = new ZipOutputStream(pipedOutputStream)) {
        client.listObjectsV2Paginator(
                ListObjectsV2Request
                        .builder()
                        .bucket("bucket")
                        .build()
        )
                .contents()
                .forEach((S3Object object) -> {
                    try {
                        if (object.key().endsWith(".png")) {
                            System.out.println("Processing " + object.key());

                            final ZipEntry entry = new ZipEntry(
                                    UUID.randomUUID().toString() + ".png" // I'm too lazy to extract file name from the object
                            );

                            zipOutputStream.putNextEntry(entry);

                            client.getObject(
                                    GetObjectRequest
                                            .builder()
                                            .bucket("bucket")
                                            .key(object.key())
                                            .build(),
                                    ResponseTransformer.toOutputStream(zipOutputStream)
                            );

                            zipOutputStream.closeEntry();
                        }
                    } catch (final Exception all) {
                        all.printStackTrace();
                    }
                });
    } catch (final Exception all) {
        all.printStackTrace();
    }
});
final Thread s3Out = new Thread(() -> {
    try {
        client.putObject(
                PutObjectRequest
                        .builder()
                        .bucket("another-bucket")
                        .key("previews.zip")
                        .build(),
                RequestBody.fromBytes(
                        IOUtils.toByteArray(pipedInputStream)
                )
        );
    } catch (final Exception all) {
        all.printStackTrace();
    }
});

s3In.start();
s3Out.start();

s3In.join();
s3Out.join();

It suffers from the same plague: the ZIP needs to be prepared in memory before the upload.

If you're interested, I've prepared a demo project, so you can play with the code.



来源:https://stackoverflow.com/questions/55204181/upload-zipoutputstream-to-s3-without-saving-zip-file-large-temporary-to-disk-u

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!