PHP\'s wordwrap() function doesn\'t work correctly for multi-byte strings like UTF-8.
There are a few examples of mb safe functions in the comments, but with some di
Here is the multibyte wordwrap function i have coded taking inspiration from of others found on the internet.
function mb_wordwrap($long_str, $width = 75, $break = "\n", $cut = false) {
$long_str = html_entity_decode($long_str, ENT_COMPAT, 'UTF-8');
$width -= mb_strlen($break);
if ($cut) {
$short_str = mb_substr($long_str, 0, $width);
$short_str = trim($short_str);
}
else {
$short_str = preg_replace('/^(.{1,'.$width.'})(?:\s.*|$)/', '$1', $long_str);
if (mb_strlen($short_str) > $width) {
$short_str = mb_substr($short_str, 0, $width);
}
}
if (mb_strlen($long_str) != mb_strlen($short_str)) {
$short_str .= $break;
}
return $short_str;
}
Dont' forget to configure PHP for using UTF-8 with :
ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
I hope this will help. Guillaume
Here's my own attempt at a function that passed a few of my own tests, though I can't promise it's 100% perfect, so please post a better one if you see a problem.
/**
* Multi-byte safe version of wordwrap()
* Seems to me like wordwrap() is only broken on UTF-8 strings when $cut = true
* @return string
*/
function wrap($str, $len = 75, $break = " ", $cut = true) {
$len = (int) $len;
if (empty($str))
return "";
$pattern = "";
if ($cut)
$pattern = '/([^'.preg_quote($break).']{'.$len.'})/u';
else
return wordwrap($str, $len, $break);
return preg_replace($pattern, "\${1}".$break, $str);
}
function mb_wordwrap($str, $width = 74, $break = "\r\n", $cut = false)
{
return preg_replace(
'~(?P<str>.{' . $width . ',}?' . ($cut ? '(?(?!.+\s+)\s*|\s+)' : '\s+') . ')(?=\S+)~mus',
'$1' . $break,
$str
);
}