问题
I've downloaded some web pages with requests and saved the content in a postgres database [in a text field] using Django's ORM. For some sudocode of what's going on, here ya go:
art = Article()
page = requests.get("http://example.com")
art.raw_html = page.content
art.save()
I verified that page.content is a bytes object, and I guess I assumed that this object would automatically be decoded upon saving, but it doesn't seem to be... it has been converted to some weird string representation of a bytes object, ostensibly by Django. It looks like this in the interpreter when I call art.raw_html:
'b\'<!DOCTYPE html>\\n<html lang="en" class="pb-page"
And if I call it with print I get this:
b'<!DOCTYPE html>\n<html lang="en" class="pb-page"
And for the life of me I can't re-encode it to a bytes object, even if I trim off the leading b' and trailing '.
I feel like there's an easy solution to this and I feel like an idiot... but after lots of experiments and googling, I'm not figuring it out.
Incidentally, if I manually copy what's returned from the print statement (like with my cursor), I can convert the clipboard contents back to a bytes object just fine and then decode it into some readably-formatted html.
Clearly there is a better way. (And yes, going forward I'll stop saving the content like this in the first place.)
回答1:
You can use eval or ast.literal_eval as below.
data = "b'gAAAAABc1arg48DmsOwQEbeiuh-FQoNSRnCOk9OvXXOE2cbBe2A46gmP6SPyymDft1yp5HsoHEzXe0KljbsdwTgPG5jCyhMmaA=='"
eval(data)
b'gAAAAABc1arg48DmsOwQEbeiuh-FQoNSRnCOk9OvXXOE2cbBe2A46gmP6SPyymDft1yp5HsoHEzXe0KljbsdwTgPG5jCyhMmaA=='
Using ast.literal_eval
import ast
ast.literal_eval(data)
thanks to @juanpa.arrivillaga. I just added to answer.
来源:https://stackoverflow.com/questions/51489961/how-can-i-convert-an-incorrectly-saved-bytes-object-back-to-bytes-python-djang