I have an email subject of the form:
=?utf-8?B?T3.....?=
The body of the email is utf-8 base64 encoded - and has decoded fine. I am current
Check out RFC2047. The 'B' means that the part between the last two '?'s is base64-encoded. The 'utf-8' naturally means that the decoded data should be interpreted as UTF-8.
MIME::Words from MIME-tools work well too for this. I ran into some issue with Encode and found MIME::Words succeeded on some strings where Encode did not.
use MIME::Words qw(:all);
$decoded = decode_mimewords(
'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
);
This is a standard extension for charset labeling of headers, specified in RFC2047.
I think that the Encode module handles that with the MIME-Header
encoding, so try this:
use Encode qw(decode);
my $decoded = decode("MIME-Header", $encoded);
The encoded-word tokens (as per RFC 2047) can occur in values of some headers. They are parsed as follows:
=?<charset>?<encoding>?<data>?=
Charset is UTF-8 in this case, the encoding is B
which means base64 (the other option is Q
which means Quoted Printable).
To read it, first decode the base64, then treat it as UTF-8 characters.
Also read the various Internet Mail RFCs for more detail, mainly RFC 2047.
Since you are using Perl, Encode::MIME::Header could be of use:
SYNOPSIS
use Encode qw/encode decode/; $utf8 = decode('MIME-Header', $header); $header = encode('MIME-Header', $utf8);
ABSTRACT
This module implements RFC 2047 Mime Header Encoding. There are 3 variant encoding names; MIME-Header, MIME-B and MIME-Q. The difference is described below
decode() encode() MIME-Header Both B and Q =?UTF-8?B?....?= MIME-B B only; Q croaks =?UTF-8?B?....?= MIME-Q Q only; B croaks =?UTF-8?Q?....?=