Multi-byte safe wordwrap() function for UTF-8

前端 未结 9 976
太阳男子
太阳男子 2020-12-01 13:17

PHP\'s wordwrap() function doesn\'t work correctly for multi-byte strings like UTF-8.

There are a few examples of mb safe functions in the comments, but with some di

相关标签:
9条回答
  • 2020-12-01 13:41

    Here is the multibyte wordwrap function i have coded taking inspiration from of others found on the internet.

    function mb_wordwrap($long_str, $width = 75, $break = "\n", $cut = false) {
        $long_str = html_entity_decode($long_str, ENT_COMPAT, 'UTF-8');
        $width -= mb_strlen($break);
        if ($cut) {
            $short_str = mb_substr($long_str, 0, $width);
            $short_str = trim($short_str);
        }
        else {
            $short_str = preg_replace('/^(.{1,'.$width.'})(?:\s.*|$)/', '$1', $long_str);
            if (mb_strlen($short_str) > $width) {
                $short_str = mb_substr($short_str, 0, $width);
            }
        }
        if (mb_strlen($long_str) != mb_strlen($short_str)) {
            $short_str .= $break;
        }
        return $short_str;
    }
    

    Dont' forget to configure PHP for using UTF-8 with :

    ini_set('default_charset', 'UTF-8');
    mb_internal_encoding('UTF-8');
    mb_regex_encoding('UTF-8');
    

    I hope this will help. Guillaume

    0 讨论(0)
  • 2020-12-01 13:42

    Here's my own attempt at a function that passed a few of my own tests, though I can't promise it's 100% perfect, so please post a better one if you see a problem.

    /**
     * Multi-byte safe version of wordwrap()
     * Seems to me like wordwrap() is only broken on UTF-8 strings when $cut = true
     * @return string
     */
    function wrap($str, $len = 75, $break = " ", $cut = true) { 
        $len = (int) $len;
    
        if (empty($str))
            return ""; 
    
        $pattern = "";
    
        if ($cut)
            $pattern = '/([^'.preg_quote($break).']{'.$len.'})/u'; 
        else
            return wordwrap($str, $len, $break);
    
        return preg_replace($pattern, "\${1}".$break, $str); 
    }
    
    0 讨论(0)
  • 2020-12-01 13:45
    function mb_wordwrap($str, $width = 74, $break = "\r\n", $cut = false)
            {
                return preg_replace(
                    '~(?P<str>.{' . $width . ',}?' . ($cut ? '(?(?!.+\s+)\s*|\s+)' : '\s+') . ')(?=\S+)~mus',
                    '$1' . $break,
                    $str
                );
            }
    
    0 讨论(0)
提交回复
热议问题