This is my code
To identity an unknown character:
example with the string: "a:\xe2\x80\x85b"
that looks like a: b
but with a smaller space.
$str = "a:\xe2\x80\x85b"; // I wrote \xe2\x80\x85 only to set $str and to show a working code,
// but here I don't know what are the values of these bytes
preg_match('~:(.*?)b~us', $str, $m); // shortest substring between : and b
echo implode(' ', array_map(function ($b) { return dechex(ord($b)); }, str_split($m[1])));
// e2 80 85
I obtain the 3 bytes e2
80
85
, then I search if it represents one or several characters in the unicode table and I find: U+2005 e2 80 85 FOUR-PER-EM SPACE
Conclusion: the unknown character is a FOUR-PER-EM SPACE
(unicode point: U+2005) and needs the 3 bytes e2 80 85
to be encoded in UTF-8. So I can write it "\xe2\x80\x85"
in a double quoted string.