How to read text inside body of mail using javax.mail

大憨熊 提交于 2019-11-27 11:47:08
Austin D

This answer extends yurin's answer. The issue he brought up was that the content of a MimeMultipart may itself be another MimeMultipart. The getTextFromMimeMultipart() method below recurses in such cases on the content until the message body has been fully parsed.

private String getTextFromMessage(Message message) throws MessagingException, IOException {
    String result = "";
    if (message.isMimeType("text/plain")) {
        result = message.getContent().toString();
    } else if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        result = getTextFromMimeMultipart(mimeMultipart);
    }
    return result;
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart)  throws MessagingException, IOException{
    String result = "";
    int count = mimeMultipart.getCount();
    for (int i = 0; i < count; i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        if (bodyPart.isMimeType("text/plain")) {
            result = result + "\n" + bodyPart.getContent();
            break; // without break same text appears twice in my tests
        } else if (bodyPart.isMimeType("text/html")) {
            String html = (String) bodyPart.getContent();
            result = result + "\n" + org.jsoup.Jsoup.parse(html).text();
        } else if (bodyPart.getContent() instanceof MimeMultipart){
            result = result + getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
        }
    }
    return result;
}
hendalst

This answer extends Austin's answer to correct the orginal issue with treatment of multipart/alternative (// without break same text appears twice in my tests).

The text appears twice because for multipart/alternative, the user agent is expected to choose only one part.

From RFC2046:

The "multipart/alternative" type is syntactically identical to "multipart/mixed", but the semantics are different. In particular, each of the body parts is an "alternative" version of the same information.

Systems should recognize that the content of the various parts are interchangeable. Systems should choose the "best" type based on the local environment and references, in some cases even through user interaction. As with "multipart/mixed", the order of body parts is significant. In this case, the alternatives appear in an order of increasing faithfulness to the original content. In general, the best choice is the LAST part of a type supported by the recipient system's local environment.

Same example with treatment for alternatives:

private String getTextFromMessage(Message message) throws IOException, MessagingException {
    String result = "";
    if (message.isMimeType("text/plain")) {
        result = message.getContent().toString();
    } else if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        result = getTextFromMimeMultipart(mimeMultipart);
    }
    return result;
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart) throws IOException, MessagingException {

    int count = mimeMultipart.getCount();
    if (count == 0)
        throw new MessagingException("Multipart with no body parts not supported.");
    boolean multipartAlt = new ContentType(mimeMultipart.getContentType()).match("multipart/alternative");
    if (multipartAlt)
        // alternatives appear in an order of increasing 
        // faithfulness to the original content. Customize as req'd.
        return getTextFromBodyPart(mimeMultipart.getBodyPart(count - 1));
    String result = "";
    for (int i = 0; i < count; i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        result += getTextFromBodyPart(bodyPart);
    }
    return result;
}

private String getTextFromBodyPart(
        BodyPart bodyPart) throws IOException, MessagingException {

    String result = "";
    if (bodyPart.isMimeType("text/plain")) {
        result = (String) bodyPart.getContent();
    } else if (bodyPart.isMimeType("text/html")) {
        String html = (String) bodyPart.getContent();
        result = org.jsoup.Jsoup.parse(html).text();
    } else if (bodyPart.getContent() instanceof MimeMultipart){
        result = getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
    }
    return result;
}

Note that this is a very simple example. It misses many cases and should not be used in production in it's current format.

Below is method that will takes text from message in case bodyParts are text and html.

  import javax.mail.BodyPart;
  import javax.mail.Message;
  import javax.mail.internet.MimeMultipart;
  import org.jsoup.Jsoup;

  ....    
  private String getTextFromMessage(Message message) throws Exception {
    if (message.isMimeType("text/plain")){
        return message.getContent().toString();
    }else if (message.isMimeType("multipart/*")) {
        String result = "";
        MimeMultipart mimeMultipart = (MimeMultipart)message.getContent();
        int count = mimeMultipart.getCount();
        for (int i = 0; i < count; i ++){
            BodyPart bodyPart = mimeMultipart.getBodyPart(i);
            if (bodyPart.isMimeType("text/plain")){
                result = result + "\n" + bodyPart.getContent();
                break;  //without break same text appears twice in my tests
            } else if (bodyPart.isMimeType("text/html")){
                String html = (String) bodyPart.getContent();
                result = result + "\n" + Jsoup.parse(html).text();

            }
        }
        return result;
    }
    return "";
}

Update. There is a case, that bodyPart itself can be of type multipart. (I met such email after have written this answer.) In this case you will need rewrite above method with recursion.

I don't think so, otherwise what would happen if a Part's mime type is image/jpeg? The API returns an Object because internally it tries to give you something useful, provided you know what is expected to be. For general purpose software, it's intended to be used like this:

if (part.isMimeType("text/plain")) {
   ...
} else if (part.isMimeType("multipart/*")) {
   ...
} else if (part.isMimeType("message/rfc822")) {
   ...
} else {
   ...
}

You also have the raw (actually not so raw, see the Javadoc) Part.getInputStream(), but I think it's unsafe to assume that each and every message you receive is a text-based one - unless you are writing a very specific application and you have control over the input source.

JAVAGeek

If you want to get text always then you can skip other types like 'multipart' etc...

  Object body = message.getContent(); 
    if(body instanceof String){
    // hey it's a text
    }

Don't reinvent the wheel! You can simply use Apache Commons Email (see here)

Kotlin example:

fun readHtmlContent(message: MimeMessage) = 
        MimeMessageParser(message).parse().htmlContent

If email does not have html content, but it has plain content (you can check that by hasPlainContent and hasHtmlContent methods) then you should use this code:

fun readPlainContent(message: MimeMessage) = 
        MimeMessageParser(message).parse().plainContent

Java example:

String readHtmlContent(MimeMessage message) throws Exception {
    return new MimeMessageParser(message).parse().getHtmlContent();
}

String readPlainContent(MimeMessage message) throws Exception {
    return new MimeMessageParser(message).parse().getPlainContent();
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!