How to remove multiple UTF-8 BOM sequences before “<!DOCTYPE>”?

匿名 (未验证) 提交于 2019-12-03 01:27:01

问题:

Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML.

private function fetch($name) {     $path = $this->j->config['template_path'] . $name . '.html';     if (!file_exists($path)) {         dbgerror('Could not find the template "' . $name . '" in ' . $path);     }     $f = fopen($path, 'r');     $t = fread($f, filesize($path));     fclose($f);     if (substr($t, 0, 3) == b'\xef\xbb\xbf') {         $t = substr($t, 3);     }     return $t; } 

Even though I've added the BOM fix I'm still having problems with Firefox accepting it. You can see a live copy here: http://ircb.in/jisti/ (and the template file I threw at http://ircb.in/jisti/home.html if you want to check it out)

Any idea how to fix this? o_o

回答1:

you would use the following code to remove utf8 bom

//Remove UTF8 Bom  function remove_utf8_bom($text) {     $bom = pack('H*','EFBBBF');     $text = preg_replace("/^$bom/", '', $text);     return $text; } 


回答2:

try:

// -------- read the file-content ---- $str = file_get_contents($source_file);   // -------- remove the utf-8 BOM ---- $str = str_replace("\xEF\xBB\xBF",'',$str);   // -------- get the Object from JSON ----  $obj = json_decode($str);  

:)



回答3:

Another way to remove the BOM which is Unicode code point U+FEFF

$str = preg_replace('/\x{FEFF}/u', '', $file); 


回答4:

b'\xef\xbb\xbf' stands for the literal string "\xef\xbb\xbf". If you want to check for a BOM, you need to use double quotes, so the \x sequences are actually interpreted into bytes:

"\xef\xbb\xbf" 

Your files also seem to contain a lot more garbage than just a single leading BOM:

$ curl http://ircb.in/jisti/ | xxd  0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef  ................ 0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068  .....000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561  tml>..


回答5:

This global funtion resolve for UTF-8 system base charset. Tanks!

function prepareCharset($str) {      // set default encode     mb_internal_encoding('UTF-8');      // pre filter     if (empty($str)) {         return $str;     }      // get charset     $charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII'));      if (stristr($charset, 'utf') || stristr($charset, 'iso')) {         $str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str));     } else {         $str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');     }      // remove BOM     $str = urldecode(str_replace("%C2%81", '', urlencode($str)));      // prepare string     return $str; } 


回答6:

An extra method to do the same job:

function remove_utf8_bom_head($text) {     if(substr(bin2hex($text), 0, 6) === 'efbbbf') {         $text = substr($text, 3);     }     return $text; } 

The other methods I found cannot work in my case.

Hope it helps in some special case.



回答7:

If you are reading some API using file_get_contents and got an inexplicable NULL from json_decode, check the value of json_last_error(): sometimes the value returned from file_get_contents will have an extraneous BOM that is almost invisible when you inspect the string, but will make json_last_error() to return JSON_ERROR_SYNTAX (4).

In this case, check the first 3 bytes - echoing them is not very useful because the BOM is invisible on most settings:

If the line above returns TRUE for you, then a simple test may fix the problem:



回答8:

This might help. let me know if you care for me to expand my thought process.

Result:

terminal$ php TESTINGSTRIPZ.php        YOUR RESULT IS: "quoted text" // 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!