I am working on a process to dump files from a Redshift
database, and would prefer not to have to locally download the files to process the data. I saw that
This may or may not be relevant to what you want to do, but for my situation one thing that worked well was using tempfile:
import tempfile
import boto3
import PyPDF2
bucket_name = 'my_bucket'
s3 = boto3.resource('s3')
temp = tempfile.NamedTemporaryFile()
s3.Bucket(bucket_name).download_file(key_name, temp.name)
pdfFileObj = open(temp.name,'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
[... do what you will with your file ...]
temp.close()