I am working on a process to dump files from a Redshift
database, and would prefer not to have to locally download the files to process the data. I saw that
If you have a mybucket
S3 bucket, which contains a beer
key, here is how to download and fetch the value without storing it in a local file:
import boto3
s3 = boto3.resource('s3')
print s3.Object('mybucket', 'beer').get()['Body'].read()
This may or may not be relevant to what you want to do, but for my situation one thing that worked well was using tempfile:
import tempfile
import boto3
import PyPDF2
bucket_name = 'my_bucket'
s3 = boto3.resource('s3')
temp = tempfile.NamedTemporaryFile()
s3.Bucket(bucket_name).download_file(key_name, temp.name)
pdfFileObj = open(temp.name,'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
[... do what you will with your file ...]
temp.close()