How to save S3 object to a file using boto3

后端 未结 7 1738

I\'m trying to do a \"hello world\" with new boto3 client for AWS.

The use-case I have is fairly simple: get object from S3 and save it to the file.

In boto

相关标签:
7条回答
  • 2020-11-28 02:50

    When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:

    def s3_download(source, destination,
                    exists_strategy='raise',
                    profile_name=None):
        """
        Copy a file from an S3 source to a local destination.
    
        Parameters
        ----------
        source : str
            Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
        destination : str
        exists_strategy : {'raise', 'replace', 'abort'}
            What is done when the destination already exists?
        profile_name : str, optional
            AWS profile
    
        Raises
        ------
        botocore.exceptions.NoCredentialsError
            Botocore is not able to find your credentials. Either specify
            profile_name or add the environment variables AWS_ACCESS_KEY_ID,
            AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
            See https://boto3.readthedocs.io/en/latest/guide/configuration.html
        """
        exists_strategies = ['raise', 'replace', 'abort']
        if exists_strategy not in exists_strategies:
            raise ValueError('exists_strategy \'{}\' is not in {}'
                             .format(exists_strategy, exists_strategies))
        session = boto3.Session(profile_name=profile_name)
        s3 = session.resource('s3')
        bucket_name, key = _s3_path_split(source)
        if os.path.isfile(destination):
            if exists_strategy is 'raise':
                raise RuntimeError('File \'{}\' already exists.'
                                   .format(destination))
            elif exists_strategy is 'abort':
                return
        s3.Bucket(bucket_name).download_file(key, destination)
    
    from collections import namedtuple
    
    S3Path = namedtuple("S3Path", ["bucket_name", "key"])
    
    
    def _s3_path_split(s3_path):
        """
        Split an S3 path into bucket and key.
    
        Parameters
        ----------
        s3_path : str
    
        Returns
        -------
        splitted : (str, str)
            (bucket, key)
    
        Examples
        --------
        >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
        S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
        """
        if not s3_path.startswith("s3://"):
            raise ValueError(
                "s3_path is expected to start with 's3://', " "but was {}"
                .format(s3_path)
            )
        bucket_key = s3_path[len("s3://"):]
        bucket_name, key = bucket_key.split("/", 1)
        return S3Path(bucket_name, key)
    
    0 讨论(0)
  • 2020-11-28 02:52

    Note: I'm assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

    import boto3
    
    #initiate s3 client 
    s3 = boto3.resource('s3')
    
    #Download object to the file    
    s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
    
    0 讨论(0)
  • 2020-11-28 02:56

    There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

    s3_client = boto3.client('s3')
    open('hello.txt').write('Hello, world!')
    
    # Upload the file to S3
    s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')
    
    # Download the file from S3
    s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
    print(open('hello2.txt').read())
    

    These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

    Note that s3_client.download_file won't create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).

    0 讨论(0)
  • 2020-11-28 03:06
    # Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}
    
    import boto3
    import io
    
    s3 = boto3.resource('s3')
    
    obj = s3.Object('my-bucket', 'key-to-file.json')
    data = io.BytesIO()
    obj.download_fileobj(data)
    
    # object is now a bytes string, Converting it to a dict:
    new_dict = json.loads(data.getvalue().decode("utf-8"))
    
    print(new_dict['status']) 
    # Should print "Error"
    
    0 讨论(0)
  • 2020-11-28 03:07

    If you wish to download a version of a file, you need to use get_object.

    import boto3
    
    bucket = 'bucketName'
    prefix = 'path/to/file/'
    filename = 'fileName.ext'
    
    s3c = boto3.client('s3')
    s3r = boto3.resource('s3')
    
    if __name__ == '__main__':
        for version in s3r.Bucket(bucket).object_versions.filter(Prefix=prefix + filename):
            file = version.get()
            version_id = file.get('VersionId')
            obj = s3c.get_object(
                Bucket=bucket,
                Key=prefix + filename,
                VersionId=version_id,
            )
            with open(f"{filename}.{version_id}", 'wb') as f:
                for chunk in obj['Body'].iter_chunks(chunk_size=4096):
                    f.write(chunk)
    

    Ref: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html

    0 讨论(0)
  • 2020-11-28 03:09

    boto3 now has a nicer interface than the client:

    resource = boto3.resource('s3')
    my_bucket = resource.Bucket('MyBucket')
    my_bucket.download_file(key, local_filename)
    

    This by itself isn't tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

    Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.

    0 讨论(0)
提交回复
热议问题