Multi-byte safe wordwrap() function for UTF-8

前端未结

关注

 9  976

太阳男子

PHP\'s wordwrap() function doesn\'t work correctly for multi-byte strings like UTF-8.

There are a few examples of mb safe functions in the comments, but with some di

相关标签:

9条回答

囚心锁ツ

2020-12-01 13:41

Here is the multibyte wordwrap function i have coded taking inspiration from of others found on the internet.

function mb_wordwrap($long_str, $width = 75, $break = "\n", $cut = false) {
    $long_str = html_entity_decode($long_str, ENT_COMPAT, 'UTF-8');
    $width -= mb_strlen($break);
    if ($cut) {
        $short_str = mb_substr($long_str, 0, $width);
        $short_str = trim($short_str);
    }
    else {
        $short_str = preg_replace('/^(.{1,'.$width.'})(?:\s.*|$)/', '$1', $long_str);
        if (mb_strlen($short_str) > $width) {
            $short_str = mb_substr($short_str, 0, $width);
        }
    }
    if (mb_strlen($long_str) != mb_strlen($short_str)) {
        $short_str .= $break;
    }
    return $short_str;
}

Dont' forget to configure PHP for using UTF-8 with :

ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');

I hope this will help. Guillaume

0 讨论(0)

花落未央

2020-12-01 13:42

Here's my own attempt at a function that passed a few of my own tests, though I can't promise it's 100% perfect, so please post a better one if you see a problem.

/**
 * Multi-byte safe version of wordwrap()
 * Seems to me like wordwrap() is only broken on UTF-8 strings when $cut = true
 * @return string
 */
function wrap($str, $len = 75, $break = " ", $cut = true) { 
    $len = (int) $len;

    if (empty($str))
        return ""; 

    $pattern = "";

    if ($cut)
        $pattern = '/([^'.preg_quote($break).']{'.$len.'})/u'; 
    else
        return wordwrap($str, $len, $break);

    return preg_replace($pattern, "\${1}".$break, $str); 
}

0 讨论(0)

逝去的感伤

2020-12-01 13:45

function mb_wordwrap($str, $width = 74, $break = "\r\n", $cut = false)
        {
            return preg_replace(
                '~(?P<str>.{' . $width . ',}?' . ($cut ? '(?(?!.+\s+)\s*|\s+)' : '\s+') . ')(?=\S+)~mus',
                '$1' . $break,
                $str
            );
        }

0 讨论(0)

上一页 1 2