I am extracting emails from Gmail using the following:
def getMsgs():
try:
conn = imaplib.IMAP4_SSL(\"imap.gmail.com\", 993)
except:
print \'Faile
If you are using Python3.6 or later, you can use the email.message.Message.get_content() method to decode the text automatically. This method supersedes get_payload()
, though get_payload()
is still available.
Say you have a string s
containing this email message (based on the examples in the docs):
Subject: Ayons asperges pour le =?utf-8?q?d=C3=A9jeuner?=
From: =?utf-8?q?Pep=C3=A9?= Le Pew <pepe@example.com>
To: Penelope Pussycat <penelope@example.com>,
Fabrette Pussycat <fabrette@example.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Salut!
Cela ressemble =C3=A0 un excellent recipie[1] d=C3=A9jeuner.
[1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718
--Pep=C3=A9
=20
Non-ascii characters in the string have been encoded with the quoted-printable
encoding, as specified in the Content-Transfer-Encoding
header.
Create an email object:
import email
from email import policy
msg = email.message_from_string(s, policy=policy.default)
Setting the policy is required here; otherwise policy.compat32 is used, which returns a legacy Message instance that doesn't have the get_content method. policy.default will eventually become the default policy, but as of Python3.7 it's still policy.compat32
.
The get_content()
method handles decoding automatically:
print(msg.get_content())
Salut!
Cela ressemble à un excellent recipie[1] déjeuner.
[1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718
--Pepé
If you have a multipart message, get_content()
needs to be called on the individual parts, like this:
for part in message.iter_parts():
print(part.get_content())
You could/should use the email.parser module to decode mail messages, for example (quick and dirty example!):
from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()
# Now you can access the message and its submessages (if it's multipart)
print rootMessage.is_multipart()
# Or check for errors
print rootMessage.defects
# If it's a multipart message, you can get the first submessage and then its payload
# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)
Using the "decode" parameter of Message.get_payload, the module automatically decodes the content, depending on its encoding (e.g. quoted printables as in your question).
That's known as quoted-printable encoding. You probably want to use something like quopri.decodestring
- http://docs.python.org/library/quopri.html