how can I convert an incorrectly-saved bytes object back to bytes? (python/django)

问题

I've downloaded some web pages with requests and saved the content in a postgres database [in a text field] using Django's ORM. For some sudocode of what's going on, here ya go:

art = Article()
page = requests.get("http://example.com")
art.raw_html = page.content
art.save()

I verified that page.content is a bytes object, and I guess I assumed that this object would automatically be decoded upon saving, but it doesn't seem to be... it has been converted to some weird string representation of a bytes object, ostensibly by Django. It looks like this in the interpreter when I call art.raw_html:

'b\'<!DOCTYPE html>\\n<html lang="en" class="pb-page"

And if I call it with print I get this:

b'<!DOCTYPE html>\n<html lang="en" class="pb-page"

And for the life of me I can't re-encode it to a bytes object, even if I trim off the leading b' and trailing '.

I feel like there's an easy solution to this and I feel like an idiot... but after lots of experiments and googling, I'm not figuring it out.

Incidentally, if I manually copy what's returned from the print statement (like with my cursor), I can convert the clipboard contents back to a bytes object just fine and then decode it into some readably-formatted html.

Clearly there is a better way. (And yes, going forward I'll stop saving the content like this in the first place.)

回答1:

You can use eval or ast.literal_eval as below.

data = "b'gAAAAABc1arg48DmsOwQEbeiuh-FQoNSRnCOk9OvXXOE2cbBe2A46gmP6SPyymDft1yp5HsoHEzXe0KljbsdwTgPG5jCyhMmaA=='"

eval(data)
b'gAAAAABc1arg48DmsOwQEbeiuh-FQoNSRnCOk9OvXXOE2cbBe2A46gmP6SPyymDft1yp5HsoHEzXe0KljbsdwTgPG5jCyhMmaA=='

Using ast.literal_eval

import ast
ast.literal_eval(data)

thanks to @juanpa.arrivillaga. I just added to answer.

来源：https://stackoverflow.com/questions/51489961/how-can-i-convert-an-incorrectly-saved-bytes-object-back-to-bytes-python-djang

标签

python

django

python-unicode