Tried this:
import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = \"/temp/\"
fileName = \"bigFile.gz\" # this happens to be a 5.9 Gig
Your code was already correct. Indeed, a minimal example of a multipart upload just looks like this:
import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')
You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads. Just call upload_file
, and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).
You seem to have been confused by the fact that the end result in S3 wasn't visibly made up of multiple parts:
Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.
... but this is the expected outcome. The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.
Why not use just the copy option in boto3?
s3.copy(CopySource={
'Bucket': sourceBucket,
'Key': sourceKey},
Bucket=targetBucket,
Key=targetKey,
ExtraArgs={'ACL': 'bucket-owner-full-control'})
There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs.
In your code snippet, clearly should be part
-> part1
in the dictionary. Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts'
list would contain an element for each part.
You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/
Change Part to Part1
import boto3
bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'
s3 = boto3.client('s3')
# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
part1 = s3.upload_part(Bucket=bucket
, Key=key
, PartNumber=1
, UploadId=mpu['UploadId']
, Body=data)
# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
'Parts': [
{
'PartNumber': 1,
'ETag': part1['ETag']
}
]
}
# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
, Key=key
, UploadId=mpu['UploadId']
, MultipartUpload=part_info)
I would advise you to use boto3.s3.transfer for this purpose. Here is an example:
import boto3
def upload_file(filename):
session = boto3.Session()
s3_client = session.client("s3")
try:
print("Uploading file: {}".format(filename))
tc = boto3.s3.transfer.TransferConfig()
t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)
t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")
except Exception as e:
print("Error uploading: {}".format(e))
As described in official boto3 documentation:
The AWS SDK for Python automatically manages retries and multipart and non-multipart transfers.
The management operations are performed by using reasonable default settings that are well-suited for most scenarios.
So all you need to do is just to set the desired multipart threshold value that will indicate the minimum file size for which the multipart upload will be automatically handled by Python SDK:
import boto3
from boto3.s3.transfer import TransferConfig
# Set the desired multipart threshold value (5GB)
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)
# Perform the transfer
s3 = boto3.client('s3')
s3.upload_file('FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', Config=config)
Moreover, you can also use multithreading mechanism for multipart upload by setting max_concurrency
:
# To consume less downstream bandwidth, decrease the maximum concurrency
config = TransferConfig(max_concurrency=5)
# Download an S3 object
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)
And finally in case you want perform multipart upload in single thread just set use_threads=False
:
# Disable thread use/transfer concurrency
config = TransferConfig(use_threads=False)
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)