Python email quoted-printable encoding problem

后端未结

关注

 3  826

I am extracting emails from Gmail using the following:

def getMsgs():
 try:
    conn = imaplib.IMAP4_SSL(\"imap.gmail.com\", 993)
  except:
    print \'Faile


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-04 00:24
              
            
            
                                                                       
If you are using Python3.6 or later, you can use the email.message.Message.get_content() method to decode the text automatically.  This method supersedes get_payload(), though get_payload() is still available.

Say you have a string s containing this email message (based on the examples in the docs):

Subject: Ayons asperges pour le =?utf-8?q?d=C3=A9jeuner?=
From: =?utf-8?q?Pep=C3=A9?= Le Pew <pepe@example.com>
To: Penelope Pussycat <penelope@example.com>,
 Fabrette Pussycat <fabrette@example.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

    Salut!

    Cela ressemble =C3=A0 un excellent recipie[1] d=C3=A9jeuner.

    [1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718

    --Pep=C3=A9
   =20


Non-ascii characters in the string have been encoded with the quoted-printable encoding, as specified in the Content-Transfer-Encoding header.

Create an email object:

import email
from email import policy

msg = email.message_from_string(s, policy=policy.default)


Setting the policy is required here; otherwise policy.compat32 is used, which returns a legacy Message instance that doesn't have the get_content method.  policy.default will eventually become the default policy, but as of Python3.7 it's still policy.compat32.

The get_content() method handles decoding automatically:

print(msg.get_content())

Salut!

Cela ressemble à un excellent recipie[1] déjeuner.

[1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718

--Pepé


If you have a multipart message, get_content() needs to be called on the individual parts, like this:

for part in message.iter_parts():
    print(part.get_content())

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2020-12-04 00:35
              
            
            
                                                                       
You could/should use the email.parser module to decode mail messages, for example (quick and dirty example!):

from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()

# Now you can access the message and its submessages (if it's multipart)
print rootMessage.is_multipart()

# Or check for errors
print rootMessage.defects

# If it's a multipart message, you can get the first submessage and then its payload
# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)


Using the "decode" parameter of Message.get_payload, the module automatically decodes the content, depending on its encoding (e.g. quoted printables as in your question).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-12-04 00:50
              
            
            
                                                                       
That's known as quoted-printable encoding. You probably want to use something like quopri.decodestring - http://docs.python.org/library/quopri.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复