Remove non-utf8 characters from string

后端 未结 18 1476
心在旅途
心在旅途 2020-11-22 11:56

Im having a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation)

18条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-22 12:56

    UConverter can be used since PHP 5.5. UConverter is better the choice if you use intl extension and don't use mbstring.

    function replace_invalid_byte_sequence($str)
    {
        return UConverter::transcode($str, 'UTF-8', 'UTF-8');
    }
    
    function replace_invalid_byte_sequence2($str)
    {
        return (new UConverter('UTF-8', 'UTF-8'))->convert($str);
    }
    

    htmlspecialchars can be used to remove invalid byte sequence since PHP 5.4. Htmlspecialchars is better than preg_match for handling large size of byte and the accuracy. A lot of the wrong implementation by using regular expression can be seen.

    function replace_invalid_byte_sequence3($str)
    {
        return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 'UTF-8'));
    }
    

提交回复
热议问题