Determining and removing invisible characters from a string in PHP (%E2%80%8E)

一世执手 提交于 2019-12-09 11:59:45

问题


I have strings in PHP which I read from a database. The strings are URLs and at first glance they look good, but there seems to be some weird character at the end. In the address bar of the browser, the string '%E2%80%8E' gets appended to the URL, which breaks the URL.

I found this post on stripping the left-to-right-mark from a string in PHP and it seems related to my problem, but the solution does not work for me because my characters seem to be something else.

So how can I determine which character I have so I can remove it from the strings?

(I would post one of the URLs here as an example, but the stack overflow form strips the character at the end as soon as I paste it in here.)

I know that I could only allow certain chars in the string and discard all others. But I would still like to know what char it is -- and how it gets into the database.

EDIT: The question has been answered and the code given in the accepted answer works for me:

$str = preg_replace('/\p{C}+/u', "", $str);

回答1:


If the input is utf8-encoded, might use unicode regex to match/strip invisible control characters like e2808e (left-to-right-mark). Use u (PCRE_UTF8) modifier and \p{C} or \p{Other}.

Strip out all invisibles:

$str = preg_replace('/\p{C}+/u', "", $str);

Here is a list of \p{Other}


Detect/identify invisibles:

$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";

// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
  echo "<pre>\n";
  foreach($out[0] AS $k => $v) {
    echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
  }
  echo "</pre>";
}

outputs:

detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9

Test on eval.in

To identify, look up at Google e.g. fileformat.info:

@google: site:fileformat.info e2808e



来源:https://stackoverflow.com/questions/23130740/determining-and-removing-invisible-characters-from-a-string-in-php-e2808e

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!