How to split Chinese characters in PHP?

前端 未结 3 1587
南笙
南笙 2021-01-03 13:13

I need some help regarding how to split Chinese characters mixed with English words and numbers in PHP.

For example, if I read

FrontPage 2000中文版應用大全
         


        
相关标签:
3条回答
  • 2021-01-03 13:55

    With this code you can make chinese text (utf8) to wrap at the end of the line so that it is still readable

    print_r(preg_match_all('/([\w]+)|(.)/u', $str, $matches));
    $arr_result = array();
    
    foreach ($matches[0] as $key => $val) {
        $arr_result[]=$val;
        $arr_result[]="​"; //add Zero-Width Space
    } 
    foreach ($arr_result as $key => $val) {
        $out .= $val;
    } 
    return $out;
    
    0 讨论(0)
  • 2021-01-03 14:01

    Assuming you are using UTF-8 (or you can convert it to UTF-8 using Iconv or some other tools), then using the u modifier (doc: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php )

    <?
        $s = "FrontPage 2000中文版應用大全";
        print_r(preg_match_all('/./u', $s, $matches));
        echo "\n";
        print_r($matches);
    ?>
    

    will give

    21
    Array
    (
        [0] => Array
            (
                [0] => F
                [1] => r
                [2] => o
                [3] => n
                [4] => t
                [5] => P
                [6] => a
                [7] => g
                [8] => e
                [9] =>  
                [10] => 2
                [11] => 0
                [12] => 0
                [13] => 0
                [14] => 中
                [15] => 文
                [16] => 版
                [17] => 應
                [18] => 用
                [19] => 大
                [20] => 全
            )
    
    )
    

    Note that my source code is stored in a file encoded in UTF-8 also, for the $s to contain those characters.

    The following will match alphanumeric as a group:

    <?
    $s = "FrontPage 2000中文版應用大全";
    print_r(preg_match_all('/(\w+)|(.)/u', $s, $matches));
    echo "\n";
    print_r($matches[0]);
    ?>
    

    result:

    10
    Array
    (
        [0] => FrontPage
        [1] =>  
        [2] => 2000
        [3] => 中
        [4] => 文
        [5] => 版
        [6] => 應
        [7] => 用
        [8] => 大
        [9] => 全
    )
    
    0 讨论(0)
  • 2021-01-03 14:14
        /**
         * Reference: http://www.regular-expressions.info/unicode.html
         * Korean: Hangul
         * CJK: Han
         * Japanese: Hiragana, Katakana
         * Flag u required
         */
    
        preg_match_all(
            '/\p{Hangul}|\p{Hiragana}|\p{Han}|\p{Katakana}|(\p{Latin}+)|(\p{Cyrillic}+)/u',
            $str,
            $result
        );
    

    This one is working if you are using PHP 7.0 too.

    This one is just not working. I regret I have upvoted a non-working solution....

    <?
        $s = "FrontPage 2000中文版應用大全";
        print_r(preg_match_all('/(\w+)|(.)/u', $s, $matches));
        echo "\n";
        print_r($matches[0]);
    ?>
    
    0 讨论(0)
提交回复
热议问题