问题
We have a custom php email marketing app, and an interesting problem: If the subject line of the message contains a word with accents, it 'swallows' the spaces between it and the following word. An example: the phrase
Ángel Ríos escucha y sorprende
is shown (by at least gmail and lotus notes) as
ÁngelRíos escucha y sorprende
The particular line in the message source shows:
Subject: =?ISO-8859-1?Q?=C1ngel?= =?ISO-8859-1?Q?R=EDos?= escucha y sorprende
(semi-full headers):
Delivered-To: me@gmail.com
Received: {elided}
Return-Path: <return@path>
Received: {elided}
Received: (qmail 23734 invoked by uid 48); 18 Aug 2009 13:51:14 -0000
Date: 18 Aug 2009 13:51:14 -0000
To: "Adriano" <me@gmail.com>
Subject: =?ISO-8859-1?Q?=C1ngel?= =?ISO-8859-1?Q?R=EDos?= escucha y sorprende
MIME-Version: 1.0
From: {elided}
X-Mailer: PHP
X-Lista: 1290
X-ID: 48163
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Message-ID: <kokrte.rpq06m@example.com>
EDIT:
The app uses an old version of Html Mime Mail to prepare messages, I'll try to upgrade to a newer version. Anyway, this is the function that encodes the subject:
/**
* Function to encode a header if necessary
* according to RFC2047
*/
function _encodeHeader($input, $charset = 'ISO-8859-1')
{
preg_match_all('/(\w*[\x80-\xFF]+\w*)/', $input, $matches);
foreach ($matches[1] as $value) {
$replacement = preg_replace('/([\x80-\xFF])/e', '"=" . strtoupper(dechex(ord("\1")))', $value);
$input = str_replace($value, '=?' . $charset . '?Q?' . $replacement . '?=', $input);
}
return $input;
}
And here it's the code where the subject is encoded:
if (!empty($this->headers['Subject'])) {
$subject = $this->_encodeHeader($this->headers['Subject'],
$this->build_params['head_charset']);
unset($this->headers['Subject']);
}
Wrap-up
The problem was that, indeed, the program wasn't encoding the space in the case mentioned. The accepted answer solved my problem, after a slight modification (mentioned in the comments to that answer) because the installed version of PHP didn't support a particular implementation detail.
Final answer
Although the accepted answer did solve the problem, we found that it, combined with many thousands of emails, was chewing all the available memory on the server. I checked the website of the original developer of this email framework, and found that the function had been updated to the following:
function _encodeHeader($input, $charset = 'ISO-8859-1') {
preg_match_all('/(\w*[\x80-\xFF]+\w*)/', $input, $matches);
foreach ($matches[1] as $value) {
$replacement = preg_replace('/([\x80-\xFF])/e', '"=" . strtoupper(dechex(ord("\1")))', $value);
$input = str_replace($value, $replacement , $input);
}
if (!empty($matches[1])) {
$input = str_replace(' ', '=20', $input);
$input = '=?' . $charset . '?Q?' .$input . '?=';
}
return $input;
}
which neatly solved the problem and stayed under the mem limit.
回答1:
You need to encode the space in between as well (see RFC 2047):
(=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab)
White space between adjacent 'encoded-word's is not displayed.
[…]
(=?ISO-8859-1?Q?a_b?=) (a b)
In order to cause a SPACE to be displayed within a portion of encoded text, the SPACE MUST be encoded as part of the 'encoded-word'.
(=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=) (a b)
In order to cause a SPACE to be displayed between two strings of encoded text, the SPACE MAY be encoded as part of one of the 'encoded-word's.
So this should do it:
Subject: =?ISO-8859-1?Q?=C1ngel=20R=EDos?= escucha y sorprende
Edit Try this function:
function _encodeHeader($str, $charset='ISO-8859-1')
{
$words = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$func = create_function('$match', 'return $match[0] === " " ? "_" : sprintf("=%02X", ord($match[0]));');
$encoded = false;
foreach ($words as $key => &$word) {
if (!ctype_space($word)) {
$tmp = preg_replace_callback('/[^\x21-\x3C\x3E-\x5E\x60-\x7E]/', $func, $word);
if ($tmp !== $word) {
if (!$encoded) {
$word = '=?'.$charset.'?Q?'.$tmp;
} else {
$word = $tmp;
if ($key > 0) {
$words[$key-1] = preg_replace_callback('/[^\x21-\x3C\x3E-\x5E\x60-\x7E]/', $func, $words[$key-1]);
}
}
$encoded = true;
} else {
if ($encoded) {
$words[$key-2] .= '?=';
}
$encoded = false;
}
}
}
if ($encoded) {
$words[$key] .= '?=';
}
return implode('', $words);
}
回答2:
add
$input = str_replace('?', '=3F', $input);
in this fragment:
if (!empty($matches[1])) {
$input = str_replace('?', '=3F', $input);
$input = str_replace(' ', '=20', $input);
$input = '=?' . $charset . '?Q?' .$input . '?=';
}
回答3:
Look up mbstring and UTF conversions. Many of the special characters in non-English languages are dealt with in the UTF8 character set.
Converting your subject string to UTF8 and ensuring that the email is sent as such should render the subject lines correctly.
At least it did for us when we had a similar issue sending email
回答4:
It would appear you'd better send Subject: =?ISO-8859-1?Q?=C1ngel R=EDos escucha y sorprende?=
, as the problem appears near the ?= encoding end.
来源:https://stackoverflow.com/questions/1294066/accented-words-in-email-subject-break-spacing-how-do-i-stop-this