In PHP, what is the best way to split a string into an array of Unicode characters? If the input is not necessarily UTF-8?
I want to know whether the set of Unicode
I was able to write a solution using mb_*
, including a trip to UTF-16 and back in a probably silly attempt to speed up string indexing:
$japanese2 = mb_convert_encoding($japanese, "UTF-16", "UTF-8");
$length = mb_strlen($japanese2, "UTF-16");
for($i=0; $i<$length; $i++) {
$char = mb_substr($japanese2, $i, 1, "UTF-16");
$utf8 = mb_convert_encoding($char, "UTF-8", "UTF-16");
print $utf8 . "\n";
}
I had better luck avoiding mb_internal_encoding
and just specifying everything at each mb_*
call. I'm sure I'll wind up using the preg
solution.