I am trying to convert Word text pasted by users that contain MS Word ellipsis and long dash before processing it further.
I found an old proposed solution here to t
Great solution. I copied and pasted it and it worked with out a problem. On further study, I added a few characters that were not in the search and replace array. In order to find the ASCII character id numbers, I wrote a PHP function which shows what the ASCII character number is:
function stdump($s){
for($i=0;$i<strlen($s);$i++){
echo substr($s,$i,1) . "(" . ord(substr($s,$i,1)) . ")";
}
echo "<br/>";
}
The character is display and next to it the ascii number is show in parenthesis. Like this:
echo stdump("GPUs…");
produces:
G(71)P(80)U(85)s(115)â(226)€(128)¦(166)
Hope this helps.
--Keith
For anyone getting the diamond question mark in PHP, this method of replacing UTF-8 characters worked better than using the chr function.
$search = [ // www.fileformat.info/info/unicode/<NUM>/ <NUM> = 2018
"\xC2\xAB", // « (U+00AB) in UTF-8
"\xC2\xBB", // » (U+00BB) in UTF-8
"\xE2\x80\x98", // ‘ (U+2018) in UTF-8
"\xE2\x80\x99", // ’ (U+2019) in UTF-8
"\xE2\x80\x9A", // ‚ (U+201A) in UTF-8
"\xE2\x80\x9B", // ‛ (U+201B) in UTF-8
"\xE2\x80\x9C", // “ (U+201C) in UTF-8
"\xE2\x80\x9D", // ” (U+201D) in UTF-8
"\xE2\x80\x9E", // „ (U+201E) in UTF-8
"\xE2\x80\x9F", // ‟ (U+201F) in UTF-8
"\xE2\x80\xB9", // ‹ (U+2039) in UTF-8
"\xE2\x80\xBA", // › (U+203A) in UTF-8
"\xE2\x80\x93", // – (U+2013) in UTF-8
"\xE2\x80\x94", // — (U+2014) in UTF-8
"\xE2\x80\xA6" // … (U+2026) in UTF-8
];
$replacements = [
"<<",
">>",
"'",
"'",
"'",
"'",
'"',
'"',
'"',
'"',
"<",
">",
"-",
"-",
"..."
];
str_replace($search, $replacements, $string);
Hmm. I use this function for sanitizing text copied into an RTE. It may or may not work in this case. It converts to HTML entities, but you could tweak it to just convert to regular characters:
function convertFromCP1252($string)
{
$search = array('&',
'<',
'>',
'"',
chr(212),
chr(213),
chr(210),
chr(211),
chr(209),
chr(208),
chr(201),
chr(145),
chr(146),
chr(147),
chr(148),
chr(151),
chr(150),
chr(133),
chr(194)
);
$replace = array( '&',
'<',
'>',
'"',
'‘',
'’',
'“',
'”',
'–',
'—',
'…',
'‘',
'’',
'“',
'”',
'–',
'—',
'…',
''
);
return str_replace($search, $replace, $string);
}
it works for me:
$str=file_get_contents($file);
$array=array("‘"=>"'","’"=>"'","”"=>'"',"“"=>'"',"–"=>"-","—"=>"-","–"=>"-","…"=>"...");
$str = strtr( $str,$array);
file_put_contents($file,$str);