Removing invalid/incomplete multibyte characters

左心房为你撑大大i 提交于 2019-11-30 09:23:51

How can I remove invalid multibyte characters, efficiently, securely, without notices/warnings/errors?

Well, as you already have outlined in your question on your own (or at least linked), deleting the invalid byte sequence(s) is not an option.

Instead it should be probably replaced with the replacement character U+FFFD. As of PHP 5.4.0 you can make use of the ENT_SUBSTITUTE flag for htmlentities. That's probably most safe if you don't want to reject the string.

iconv will always give you warning in recent PHP versions if not even deleting the whole string. So it does not look like a good alternative for you.

iconv('UTF-8', "ISO-8859-1//IGNORE", $string);

worked extremely well for me. Doesn't seem to generate any notice.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!