Python email quoted-printable encoding problem

后端 未结 3 824
隐瞒了意图╮
隐瞒了意图╮ 2020-12-04 00:12

I am extracting emails from Gmail using the following:

def getMsgs():
 try:
    conn = imaplib.IMAP4_SSL(\"imap.gmail.com\", 993)
  except:
    print \'Faile         


        
相关标签:
3条回答
  • 2020-12-04 00:24

    If you are using Python3.6 or later, you can use the email.message.Message.get_content() method to decode the text automatically. This method supersedes get_payload(), though get_payload() is still available.

    Say you have a string s containing this email message (based on the examples in the docs):

    Subject: Ayons asperges pour le =?utf-8?q?d=C3=A9jeuner?=
    From: =?utf-8?q?Pep=C3=A9?= Le Pew <pepe@example.com>
    To: Penelope Pussycat <penelope@example.com>,
     Fabrette Pussycat <fabrette@example.com>
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    MIME-Version: 1.0
    
        Salut!
    
        Cela ressemble =C3=A0 un excellent recipie[1] d=C3=A9jeuner.
    
        [1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718
    
        --Pep=C3=A9
       =20
    

    Non-ascii characters in the string have been encoded with the quoted-printable encoding, as specified in the Content-Transfer-Encoding header.

    Create an email object:

    import email
    from email import policy
    
    msg = email.message_from_string(s, policy=policy.default)
    

    Setting the policy is required here; otherwise policy.compat32 is used, which returns a legacy Message instance that doesn't have the get_content method. policy.default will eventually become the default policy, but as of Python3.7 it's still policy.compat32.

    The get_content() method handles decoding automatically:

    print(msg.get_content())
    
    Salut!
    
    Cela ressemble à un excellent recipie[1] déjeuner.
    
    [1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718
    
    --Pepé
    

    If you have a multipart message, get_content() needs to be called on the individual parts, like this:

    for part in message.iter_parts():
        print(part.get_content())
    
    0 讨论(0)
  • 2020-12-04 00:35

    You could/should use the email.parser module to decode mail messages, for example (quick and dirty example!):

    from email.parser import FeedParser
    f = FeedParser()
    f.feed("<insert mail message here, including all headers>")
    rootMessage = f.close()
    
    # Now you can access the message and its submessages (if it's multipart)
    print rootMessage.is_multipart()
    
    # Or check for errors
    print rootMessage.defects
    
    # If it's a multipart message, you can get the first submessage and then its payload
    # (i.e. content) like so:
    rootMessage.get_payload(0).get_payload(decode=True)
    

    Using the "decode" parameter of Message.get_payload, the module automatically decodes the content, depending on its encoding (e.g. quoted printables as in your question).

    0 讨论(0)
  • 2020-12-04 00:50

    That's known as quoted-printable encoding. You probably want to use something like quopri.decodestring - http://docs.python.org/library/quopri.html

    0 讨论(0)
提交回复
热议问题