I have a bunch of strings with different charsets. The $charset
variable contains the charset of the current string.
$content = iconv($charset, \'UT
Functions like strlen() count bytes, not characters.
See the notes in the PHP Manual for details:
Note:
strlen() returns the number of bytes rather than the number of characters in a string.
use the mb_* functions if you're working with UTF-8, unless you have the php.ini setting mbstring.func_overload enabled to overload the standard strops(), strlen(), substr(), etc functions... then strlen() will count characters
That entirely depends on what you want to do. The core strlen
and similar functions work on bytes. Every number they accept and return is a byte count or byte offset. The mb_* functions work encoding-aware on characters. All numbers they accept and return are character counts or offsets.
If you have a safe way of getting a byte offset in a string ("safe" meaning the offset is not in the middle of a multi-byte character) and then, for example, crop everything before that offset using substr
, that'll work just fine. For instance:
$str = '漢字';
$offset = strpos($str, '字');
$cropped = substr($str, $offset);
Works fine.
However, this won't work:
$cropped = substr($str, $offset, 1);
You can't safely cut out a single byte without running the risk of cutting into a multi-byte character.