Split utf8 string into array of chars

后端 未结 6 1880
别那么骄傲
别那么骄傲 2021-01-14 05:26

I\'m trying to split a utf8 encoded string into an array of chars. The function that I now use used to work, but for some reason it doesn\'t work anymore. W

相关标签:
6条回答
  • 2021-01-14 05:45

    There is a multibyte split function in PHP, mb_split.

    0 讨论(0)
  • 2021-01-14 05:46

    I found out the é was not the character I expected. Apparently there is a difference between né and ńe. I got it working by normalizing the string first.

    0 讨论(0)
  • 2021-01-14 05:48

    This is the best solution!:

    I've found this nice solution in the PHP manual pages.

    preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY);
    

    It works really fast:

    In PHP 5.6.18 it split a 6 MB big text file in a matter of seconds.

    Best of all. It doesn't need MultiByte (mb_) support!

    Similar answer also here.

    0 讨论(0)
  • 2021-01-14 05:49

    For the mb_... functions you should specify the charset encoding.

    In your example code these are especially the following two lines:

    $strLen = mb_strlen($str, 'UTF-8');
    $arr[] = mb_substr($str, $i, $len, 'UTF-8');
    

    The full picture:

    function utf8Split($str, $len = 1)
    {
      $arr = array();
      $strLen = mb_strlen($str, 'UTF-8');
      for ($i = 0; $i < $strLen; $i++)
      {
        $arr[] = mb_substr($str, $i, $len, 'UTF-8');
      }
      return $arr;
    }
    

    Because you're using UTF-8 here. However, if the input is not properly encoded, this won't work "any longer" - just because it has not been designed for something else.

    You can alternativly process UTF-8 encoded strings with PCRE regular expressions, for example this will return what you're looking for in less code:

    $str = 'Zelf heb ik maar één vraag: wie ben jij?';
    
    $chars = preg_split('/(?!^)(?=.)/u', $str);
    

    Next to preg_split there is also mb_split.

    0 讨论(0)
  • 2021-01-14 05:58
    mb_internal_encoding("UTF-8"); 
    

    46 arrays - off 41 arrays

    0 讨论(0)
  • 2021-01-14 06:04

    If you not sure about availability of mb_string function library, then use:

    Version 1:

    function utf8_str_split($str='',$len=1){
        preg_match_all("/./u", $str, $arr);
        $arr = array_chunk($arr[0], $len);
        $arr = array_map('implode', $arr);
        return $arr;
    }
    

    Version 2:

    function utf8_str_split($str='',$len=1){
        return preg_split('/(?<=\G.{'.$len.'})/u', $str,-1,PREG_SPLIT_NO_EMPTY);
    }
    

    Both functions tested in PHP5

    0 讨论(0)
提交回复
热议问题