发表新帖

发表新帖

How to split Chinese characters in PHP?

前端未结

关注

 3  1590

I need some help regarding how to split Chinese characters mixed with English words and numbers in PHP.

For example, if I read

FrontPage 2000中文版應用大全


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2021-01-03 13:55
              
            
            
                                                                       
With this code you can make chinese text (utf8) to wrap at the end of the line so that it is still readable     

print_r(preg_match_all('/([\w]+)|(.)/u', $str, $matches));
$arr_result = array();

foreach ($matches[0] as $key => $val) {
    $arr_result[]=$val;
    $arr_result[]="&#8203;"; //add Zero-Width Space
} 
foreach ($arr_result as $key => $val) {
    $out .= $val;
} 
return $out;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2021-01-03 14:01
              
            
            
                                                                       
Assuming you are using UTF-8 (or you can convert it to UTF-8 using Iconv or some other tools), then using the u modifier (doc: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php )
<?
    $s = "FrontPage 2000中文版應用大全";
    print_r(preg_match_all('/./u', $s, $matches));
    echo "\n";
    print_r($matches);
?>

will give
21
Array
(
    [0] => Array
        (
            [0] => F
            [1] => r
            [2] => o
            [3] => n
            [4] => t
            [5] => P
            [6] => a
            [7] => g
            [8] => e
            [9] =>  
            [10] => 2
            [11] => 0
            [12] => 0
            [13] => 0
            [14] => 中
            [15] => 文
            [16] => 版
            [17] => 應
            [18] => 用
            [19] => 大
            [20] => 全
        )

)

Note that my source code is stored in a file encoded in UTF-8 also, for the $s to contain those characters.
The following will match alphanumeric as a group:
<?
$s = "FrontPage 2000中文版應用大全";
print_r(preg_match_all('/(\w+)|(.)/u', $s, $matches));
echo "\n";
print_r($matches[0]);
?>

result:
10
Array
(
    [0] => FrontPage
    [1] =>  
    [2] => 2000
    [3] => 中
    [4] => 文
    [5] => 版
    [6] => 應
    [7] => 用
    [8] => 大
    [9] => 全
)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-03 14:14
              
            
            
                                                                       
    /**
     * Reference: http://www.regular-expressions.info/unicode.html
     * Korean: Hangul
     * CJK: Han
     * Japanese: Hiragana, Katakana
     * Flag u required
     */

    preg_match_all(
        '/\p{Hangul}|\p{Hiragana}|\p{Han}|\p{Katakana}|(\p{Latin}+)|(\p{Cyrillic}+)/u',
        $str,
        $result
    );

This one is working if you are using PHP 7.0 too.
This one is just not working. I regret I have upvoted a non-working solution....
<?
    $s = "FrontPage 2000中文版應用大全";
    print_r(preg_match_all('/(\w+)|(.)/u', $s, $matches));
    echo "\n";
    print_r($matches[0]);
?>

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复