Cannot read urllib error message once it is read()

问题

My problem is with error handling of the python urllib error object. I am unable to read the error message while still keeping it intact in the error object, for it to be consumed later.

response = urllib.request.urlopen(request) # request that will raise an error
response.read()
response.read() # is empty now
# Also tried seek(0), that does not work either.

So this how I intend to use it, but when the Exception bubbles up, the.read() second time is empty.

try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
    self.log.exception(err.read())
    raise err

I tried making a deepcopy of the err object,

import copy
try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
    err_obj_copy = copy.deepcopy(err)
    self.log.exception(
        "Method:{}\n"
        "URL:{}\n"
        "Data:{}\n"
        "Details:{}\n"
        "Headers:{}".format(method, url, data, err_obj_copy.read(), headers))
    raise err

but copy is unable to make a deepcopy and throws an error - TypeError: __init__() missing 5 required positional arguments: 'url', 'code', 'msg', 'hdrs', and 'fp'.

How do I read the error message, while still keeping it intact in the object?

I do know how to do it using requests, but I am stuck with legacy code and need to make it work with urllib

回答1:

This is what I did. Worked for me.

When reading the error for the first time, save it to a variable like this: msg = response.read().decode('utf8'). You can then create a new HTTPError instance, with the message, and propagate it.

resp = urllib.request.urlopen(request)
msg = resp.read().decode('utf8')
self.log.exception(msg)
raise HTTPError(resp.url, resp.code, resp.reason, resp.headers, io.BytesIO(bytes(msg, 'utf8')))

回答2:

The error object may read from the network. Network is not seekable -- you can't go back in the general case.

You could replace err with a new HTTPError instance that reads from a buffer (like io.BytesIO()) instead of the network e.g., (not tested):

content = err.read()
self.log.exception(content)
raise HTTPError(err.url, err.code, err.reason, err.headers, io.BytesIO(content))

Though I'm not sure that you should -- handle the error in a single place instead e.g., reraise a more application specific exception or leave the logging to an upstream handler.

来源：https://stackoverflow.com/questions/33660178/cannot-read-urllib-error-message-once-it-is-read

标签

python

python-3.x

error-handling

urllib