When I use substr()
I get a strange character at the end
$articleText = substr($articleText,0,500);
I have an output of 500 c
use this function, It worked for me
function substr_unicode($str, $s, $l = null) {
return join("", array_slice(
preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}
Credits: http://php.net/manual/en/function.mb-substr.php#107698
Alternative solution for UTF-8 encoded strings - this will convert UTF-8 to characters before cutting the sub-string.
$articleText = substr(utf8_decode($articleText),0,500);
To get the articleText string back to UTF-8, an extra operation will be needed:
$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );
You are trying to cut unicode character.So i preferred instead of substr()
try mb_substr()
in php.
substr()
substr ( string $string , int $start [, int $length ] )
mb_substr()
mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )
For more information for substr() - Credits => Check Here
ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:
trim()
or:
var_dump(preg_match('/^\n|\n$/', $variable));
or:
str_replace (array('\r\n', '\n', '\r'), ' ', $text)
Don't catch.
Looks like you're slicing a unicode character in half there. Use mb_substr instead for unicode-safe string slicing.
Use mb_substr instead, it is able to deal with multiple encodings, not only single-byte strings as substr:
$articleText = mb_substr($articleText,0,500,'UTF-8');