How to handle multipart/alternative mail with JavaMail?

后端 未结 3 826
忘掉有多难
忘掉有多难 2021-01-31 12:22

I wrote an application which gets all emails from an inbox, filters the emails which contain a specific string and then puts those emails in an ArrayList.

After the ema

相关标签:
3条回答
  • 2021-01-31 12:41

    I found reading e-mail with the JavaMail library much more difficult than expected. I don't blame the JavaMail API, rather I blame my poor understanding of RFC-5322 -- the official definition of Internet e-mail.

    As a thought experiment: Consider how complicated an e-mail message can become in the real world. It is possible to "infinitely" embed messages within messages. Each message itself may have multiple attachments (binary or human-readable text). Now imagine how complicated this structure becomes in the JavaMail API after parsing.

    A few tips that may help when traversing e-mail with JavaMail:

    • Message and BodyPart both implement Part.
    • MimeMessage and MimeBodyPart both implement MimePart.
    • Where possible, treat everything as a Part or MimePart. This will allow generic traversal methods to be built more easily.

    These Part methods will help to traverse:

    • String getContentType(): Starts with the MIME type. You may be tempted to treat this as a MIME type (with some hacking/cutting/matching), but don't. Better to only use this method inside the debugger for inspection.
      • Oddly, MIME type cannot be extracted directly. Instead use boolean isMimeType(String) to match. Read docs carefully to learn about powerful wildcards, such as "multipart/*".
    • Object getContent(): Might be instanceof:
      • Multipart -- container for more Parts
        • Cast to Multipart, then iterate as zero-based index with int getCount() and BodyPart getBodyPart(int)
          • Note: BodyPart implements Part
        • In my experience, Microsoft Exchange servers regularly provide two copies of the body text: plain text and HTML.
          • To match plain text, try: Part.isMimeType("text/plain")
          • To match HTML, try: Part.isMimeType("text/html")
      • Message (implements Part) -- embedded or attached e-mail
      • String (just the body text -- plain text or HTML)
        • See note above about Microsoft Exchange servers.
      • InputStream (probably a BASE64-encoded attachment)
    • String getDisposition(): Value may be null
      • if Part.ATTACHMENT.equalsIgnoreCase(getDisposition()), then call getInputStream() to get raw bytes of the attachment.

    Finally, I found the official Javadocs exclude everything in the com.sun.mail package (and possibly more). If you need these, read the code directly, or generate the unfiltered Javadocs by downloading the source and running mvn javadoc:javadoc in the mail project module of the project.

    0 讨论(0)
  • 2021-01-31 12:41

    Did you find these JavaMail FAQ entries?

    • How do I read a message with an attachment and save the attachment?
    • How do I tell if a message has attachments?
    • How do I find the main message body in a message that has attachments?
    0 讨论(0)
  • 2021-01-31 12:50

    Following up on Kevin's helpful advice, analyzing your email content Java object types with respect to their canonical names (or simple names) can be helpful too. For example, looking at one inbox I've got right now, of 486 messages 399 are Strings, and 87 are MimeMultipart. This suggests that - for my typical email - a strategy that uses instanceof to first peel off Strings is best.

    Of the Strings, 394 are text/plain, and 5 are text/html. This will not be the case for most; it's reflective of my email feeds into this particular inbox.

    But wait - there's more!!! :-) The HTML sneaks in there nevertheless: of the 87 Multipart's, 70 are multipart/alternative. No guarantees, but most (if not all of these) are TEXT + HTML.

    Of the other 17 multipart, incidentally, 15 are multipart/mixed, and 2 are multipart/signed.

    My use case with this inbox (and one other) is primarily to aggregate and analyze known mailing list content. I can't ignore any of the messages, but an analysis of this sort helps me make my processing more efficient.

    0 讨论(0)
提交回复
热议问题