Email body is a string sometimes and a list sometimes. Why?

萝らか妹 提交于 2019-12-21 04:35:11

问题


My application is written in python. What I am doing is I am running a script on each email received by postfix and do something with the email content. Procmail is responsible for running the script taking the email as input. The problem started when I was converting the input message(may be text) to email_message object(because the latter comes in handy). I am using email.message_from_string (where email is the default email module, comes with python).

import email message = email.message_from_string(original_mail_content) message_body = message.get_payload()

This message_body is sometimes returning a list[email.message.Message instance,email.message.Message instance] and sometime returning a string(actual body content of the incoming email). Why is it. And even I found one more observation. When I was browsing through the email.message.Message.get_payload() docstring, I found this..
""" The payload will either be a list object or a string.If you mutate the list object, you modify the message's payload in place....."""

So how do I have generic method to get the body of email through python? Please help me out.


回答1:


Well, the answers are correct, you should read the docs, but for an example of a generic way:

def get_first_text_part(msg):
    maintype = msg.get_content_maintype()
    if maintype == 'multipart':
        for part in msg.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_payload()
    elif maintype == 'text':
        return msg.get_payload()

This is prone to some disaster, as it is conceivable the parts themselves might have multiparts, and it really only returns the first text part, so this might be wrong too, but you can play with it.




回答2:


As crazy as it might seem, the reason for the sometimes string, sometimes list-semantics are given in the documentation. Basically, multipart messages are returned as lists.




回答3:


Rather than simply looking for a sub-part, use walk() to iterate through the message contents

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() == "multipart/alternative":
      continue
    yield part.get_payload(decode=1)

The walk() method returns an iterator that you can loop with (i.e. it's a generator). If the message is not a container of parts (i.e. has no attachments or alternates), the walk() method will then return an iterator with a single element - the message itself.

You want to skip any 'multipart' parts as they are just glue.

The above method returns all readable parts. You may want to expand this to simply return the text parts if they contain the info you are seeking.

Note that as of Python 2.5, methods get_type(), get_main_type(), and get_subtype() have been removed -> http://docs.python.org/library/email.message.html#email.message.Message.walk




回答4:


It might be MIME multipart

See http://docs.python.org/library/email.parser.html#additional-notes



来源:https://stackoverflow.com/questions/594545/email-body-is-a-string-sometimes-and-a-list-sometimes-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!