Im having a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation)
UConverter can be used since PHP 5.5. UConverter is better the choice if you use intl extension and don't use mbstring.
function replace_invalid_byte_sequence($str)
{
return UConverter::transcode($str, 'UTF-8', 'UTF-8');
}
function replace_invalid_byte_sequence2($str)
{
return (new UConverter('UTF-8', 'UTF-8'))->convert($str);
}
htmlspecialchars can be used to remove invalid byte sequence since PHP 5.4. Htmlspecialchars is better than preg_match for handling large size of byte and the accuracy. A lot of the wrong implementation by using regular expression can be seen.
function replace_invalid_byte_sequence3($str)
{
return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 'UTF-8'));
}