Proper way to decode incoming email subject (utf 8)

前端 未结 6 613
小蘑菇
小蘑菇 2020-12-16 16:28

I\'m trying to pipe my incoming mails to a PHP script so I can store them in a database and other things. I\'m using the class MIME E-mail message parser (registration requi

相关标签:
6条回答
  • 2020-12-16 16:39

    Use php native function

    <?php
    mb_decode_mimeheader($text);
    ?>
    

    This function can handle utf8 as well as iso-8859-1 string. I have tested it.

    0 讨论(0)
  • 2020-12-16 16:41

    Use php function:

    <?php
    imap_utf8($text);
    ?>
    
    0 讨论(0)
  • 2020-12-16 16:56

    Would the imap-mime-header-decode function help here?

    Found myself in a similar situation today.

    http://www.php.net/manual/en/function.imap-mime-header-decode.php

    0 讨论(0)
  • 2020-12-16 16:57

    Despite the fact that this is almost a year old - I found this and am facing a similar problem.

    I'm unsure why you're getting odd characters, but perhaps you are trying to display them somewhere your charset is unsupported.

    Here's some code I wrote which should handle everything except the charset conversion, which is a large problem that many libraries handle much better. (PHP's MB library, for instance)

    class mail {
        /**
          * If you change one of these, please check the other for fixes as well
         *
         * @const Pattern to match RFC 2047 charset encodings in mail headers
         */
        const rfc2047header = '/=\?([^ ?]+)\?([BQbq])\?([^ ?]+)\?=/';
    
        const rfc2047header_spaces = '/(=\?[^ ?]+\?[BQbq]\?[^ ?]+\?=)\s+(=\?[^ ?]+\?[BQbq]\?[^ ?]+\?=)/';
    
        /**
         * http://www.rfc-archive.org/getrfc.php?rfc=2047
         *
         * =?<charset>?<encoding>?<data>?=
         *
         * @param string $header
         */
        public static function is_encoded_header($header) {
            // e.g. =?utf-8?q?Re=3a=20Support=3a=204D09EE9A=20=2d=20Re=3a=20Support=3a=204D078032=20=2d=20Wordpress=20Plugin?=
            // e.g. =?utf-8?q?Wordpress=20Plugin?=
            return preg_match(self::rfc2047header, $header) !== 0;
        }
    
        public static function header_charsets($header) {
            $matches = null;
            if (!preg_match_all(self::rfc2047header, $header, $matches, PREG_PATTERN_ORDER)) {
                return array();
            }
            return array_map('strtoupper', $matches[1]);
        }
    
        public static function decode_header($header) {
            $matches = null;
    
            /* Repair instances where two encodings are together and separated by a space (strip the spaces) */
            $header = preg_replace(self::rfc2047header_spaces, "$1$2", $header);
    
            /* Now see if any encodings exist and match them */
            if (!preg_match_all(self::rfc2047header, $header, $matches, PREG_SET_ORDER)) {
                return $header;
            }
            foreach ($matches as $header_match) {
                list($match, $charset, $encoding, $data) = $header_match;
                $encoding = strtoupper($encoding);
                switch ($encoding) {
                    case 'B':
                        $data = base64_decode($data);
                        break;
                    case 'Q':
                        $data = quoted_printable_decode(str_replace("_", " ", $data));
                        break;
                    default:
                        throw new Exception("preg_match_all is busted: didn't find B or Q in encoding $header");
                }
                // This part needs to handle every charset
                switch (strtoupper($charset)) {
                    case "UTF-8":
                        break;
                    default:
                        /* Here's where you should handle other character sets! */
                        throw new Exception("Unknown charset in header - time to write some code.");
                }
                $header = str_replace($match, $data, $header);
            }
            return $header;
        }
    }
    

    When run through a script and displayed in a browser using UTF-8, the result is:

    آزمایش

    You would run it like so:

    $decoded = mail::decode_header("=?UTF-8?B?2KLYstmF2KfbjNi0?=");
    
    0 讨论(0)
  • Just to add yet one more way to do this (or if you don't have the mbstring extension installed but do have iconv):

    iconv_mime_decode($str, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, 'UTF-8')
    
    0 讨论(0)
  • 2020-12-16 17:03

    You can use the mb_decode_mimeheader() function to decode your string.

    0 讨论(0)
提交回复
热议问题