Accessing stream in job.get_output('body')

问题

Sample code

import boto3

glacier = boto3.resource('glacier')
job = glacier.Job(accountID, vaultlist[0], id=joblist[0])

r = job.get_output()
print(r0['body'])

That print only yields botocore.response.StreamingBody at 0xsnip

r0['body'] should be the inventory in CSV format, but I can't figure out how to get to it. I spent a bit of time trying to us io to read in the steam and either that is not the right way or I did it wrong. Can you point me in the right direction?

Thanks!

回答1:

Here's a solution that worked for me to save a glacier archive that showed up as a StreamingBody to a file. This in particular was an mp3 file.

import boto3

glacier = boto3.resource('glacier')
job = glacier.Job(accountID, vaultName, jobID)

r = job.get_output()

f1 = open('my file',"wb")
f1.write(r['body'].read())
f1.close

回答2:

OK I couldn't get the other way to work at all, mostly my own lack of skills I'm sure. But I was able to use the HTTP GET to download the inventory into a file. This is how I did that. You will see lots of I had two vaults, one job each, you could modify this and loop in other ways or just use [0] for both lists if you have one vault and one job, but the important part is the sample from Amazon EC2 that I modified to retrieve the Inventory from a completed Glacier Job.

I know my code it not very well written, but it worked for my one-shot need. Hope this is helpful to others.

import requests, sys, os, hashlib, hmac, json
from datetime import datetime

# ************* REQUEST VALUES *************
method = 'GET'
service = 'glacier'
region = '<YOUR_REGION'
host = 'glacier.' + region + '.amazonaws.com'
endpoint = 'https://glacier.' + region + '.amazonaws.com'
request_parameters = ''
accountid = '<YOUR_ACCOUNT_ID'
vaultlist = ["VAULT_ONE", "VAULT_TWO"]
joblist = ['JOB_ID_ONE',
           'JOB_ID_TWO']
rangelist = ['JOB_SIZE_ONE',
             'JOB_SIZE_TWO',]
url0 = "/" + accountid + "/vaults/" + vaultlist[0] + "/jobs/" + joblist[0] + "/output"
url1 = "/" + accountid + "/vaults/" + vaultlist[1] + "/jobs/" + joblist[1] + "/output"
filename =['archive0.json', 'archive1.json'] #filenames
# Key derivation functions. See:
# http://docs.aws.amazon.com/general/latest/gr/signature-v4-examples.html#signature-v4-examples-python
def sign(key, msg):
    return hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()

def getSignatureKey(key, dateStamp, regionName, serviceName):
    kDate = sign(('AWS4' + key).encode('utf-8'), dateStamp)
    kRegion = sign(kDate, regionName)
    kService = sign(kRegion, serviceName)
    kSigning = sign(kService, 'aws4_request')
    return kSigning

# Read AWS access key from env. variables or configuration file. Best practice is NOT
# to embed credentials in code.
access_key = os.environ.get('AWS_ACCESS_KEY')
secret_key = os.environ.get('AWS_SECRET_KEY')
if access_key is None or secret_key is None:
    print('No access key is available via your environment variables.')
    sys.exit()

# Create a date for headers and the credential string
t = datetime.utcnow()
amzdate = t.strftime('%Y%m%dT%H%M%SZ')
datestamp = t.strftime('%Y%m%d') # Date w/o time, used in credential scope

# ************* TASK 1: CREATE A CANONICAL REQUEST *************
# http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html

# Step 1 is to define the verb (GET, POST, etc.)--already done.

# Step 2: Create canonical URI--the part of the URI from domain to query
# string (use '/' if no path)
canonical_uri = url1

# Step 3: Create the canonical query string. In this example (a GET request),
# request parameters are in the query string. Query string values must
# be URL-encoded (space=%20). The parameters must be sorted by name.
# For this example, the query string is pre-formatted in the request_parameters variable.
canonical_querystring = request_parameters

# Step 4: Create the canonical headers and signed headers. Header names
# and value must be trimmed and lowercase, and sorted in ASCII order.
# Note that there is a trailing \n.
canonical_headers = 'host:' + host + '\n' + 'x-amz-date:' + amzdate + '\n'

# Step 5: Create the list of signed headers. This lists the headers
# in the canonical_headers list, delimited with ";" and in alpha order.
# Note: The request can include any headers; canonical_headers and
# signed_headers lists those that you want to be included in the
# hash of the request. "Host" and "x-amz-date" are always required.
signed_headers = 'host;x-amz-date'

# Step 6: Create payload hash (hash of the request body content). For GET
# requests, the payload is an empty string ("").
payload_hash = hashlib.sha256("".encode()).hexdigest()

# Step 7: Combine elements to create create canonical request
canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers +\
                    '\n' + signed_headers + '\n' + payload_hash

# ************* TASK 2: CREATE THE STRING TO SIGN*************
# Match the algorithm to the hashing algorithm you use, either SHA-1 or
# SHA-256 (recommended)
algorithm = 'AWS4-HMAC-SHA256'
credential_scope = datestamp + '/' + region + '/' + service + '/' + 'aws4_request'
string_to_sign = algorithm + '\n' +  amzdate + '\n' +  credential_scope + '\n' + \
                 hashlib.sha256(canonical_request.encode()).hexdigest()


# ************* TASK 3: CALCULATE THE SIGNATURE *************
# Create the signing key using the function defined above.
signing_key = getSignatureKey(secret_key, datestamp, region, service)

# Sign the string_to_sign using the signing_key
signature = hmac.new(signing_key, string_to_sign.encode('utf-8'), hashlib.sha256).hexdigest()


# ************* TASK 4: ADD SIGNING INFORMATION TO THE REQUEST *************
# The signing information can be either in a query string value or in
# a header named Authorization. This code shows how to use a header.
# Create authorization header and add to request headers
authorization_header = algorithm + ' ' + 'Credential=' + access_key + '/' + credential_scope + ', ' +\
                       'SignedHeaders=' + signed_headers + ', ' + 'Signature=' + signature

# The request can include any headers, but MUST include "host", "x-amz-date",
# and (for this scenario) "Authorization". "host" and "x-amz-date" must
# be included in the canonical_headers and signed_headers, as noted
# earlier. Order here is not significant.
# Python note: The 'host' header is added automatically by the Python 'requests' library.
# headers = {'x-amz-date':amzdate, 'Authorization':authorization_header}


headers0 = {'x-amz-date': amzdate,
            'Authorization': authorization_header,
            'x-amz-glacier-version': '2012-06-01',
            'Range': '0 - ' + rangelist[0],
            }
headers1 = {'x-amz-date': amzdate,
           'Authorization': authorization_header,
            'x-amz-glacier-version': '2012-06-01',
           'Range': rangelist[1],
            }
headers = headers1

# ************* SEND THE REQUEST *************
request_url = endpoint + url1
print(url0)
print('\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++')
print('Request URL: ' + request_url + '\n')
print('Headers: ' + json.dumps(headers))
print('Auth : ' + authorization_header + '\n' )
r = requests.get(request_url, headers=headers, stream = True)

print('\nRESPONSE++++++++++++++++++++++++++++++++++++')
print('Response code: %d\n' % r.status_code)
# print(r.text) #This is in the original Sample and useful for debugging. But not if your inventory is large.


# *********** Write it to file ***********
f = open(filename[1], mode='w')
f.write(r.text)
f.close()

来源：https://stackoverflow.com/questions/32795232/accessing-stream-in-job-get-outputbody

标签

python

amazon-glacier

boto3