Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML.
private function fetch($name) {
$path = $this->
A solution without pack
function:
$a = "1";
var_dump($a); // string(4) "1"
function deleteBom($text)
{
return preg_replace("/^\xEF\xBB\xBF/", '', $text);
}
var_dump(deleteBom($a)); // string(1) "1"
If you are reading some API using file_get_contents
and got an inexplicable NULL
from json_decode
, check the value of json_last_error()
: sometimes the value returned from file_get_contents
will have an extraneous BOM that is almost invisible when you inspect the string, but will make json_last_error()
to return JSON_ERROR_SYNTAX
(4).
>>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all");
=> "\t{"orgao":[{"Nome":"Tribunal de Justi\u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}"
>>> json_decode($json);
=> null
>>>
In this case, check the first 3 bytes - echoing them is not very useful because the BOM is invisible on most settings:
>>> substr($json, 0, 3)
=> " "
>>> substr($json, 0, 3) == pack('H*','EFBBBF');
=> true
>>>
If the line above returns TRUE for you, then a simple test may fix the problem:
>>> json_decode($json[0] == "{" ? $json : substr($json, 3))
=> {#204
+"orgao": [
{#203
+"Nome": "Tribunal de Justiça",
+"ID_Orgao": "59",
+"Condicao": "1",
},
],
...
}
b'\xef\xbb\xbf'
stands for the literal string "\xef\xbb\xbf". If you want to check for a BOM, you need to use double quotes, so the \x
sequences are actually interpreted into bytes:
"\xef\xbb\xbf"
Your files also seem to contain a lot more garbage than just a single leading BOM:
$ curl http://ircb.in/jisti/ | xxd
0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef ................
0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068 .....<!DOCTYPE h
0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561 tml>.<html>.<hea
...
This might help. let me know if you care for me to expand my thought process.
<?php
//
// labled TESTINGSTRIPZ.php
//
define('CHARSET', 'UTF-8');
$stringy = "\xef\xbb\xbf\"quoted text\" ";
$str_find_array = array( "\xef\xbb\xbf");
$str_replace_array = array( '');
$RESULT =
trim(
mb_convert_encoding(
str_replace(
$str_find_array,
$str_replace_array,
strip_tags( $stringy )
),
'UTF-8',
mb_detect_encoding(
strip_tags($stringy)
)
)
);
print("YOUR RESULT IS: " . $RESULT.PHP_EOL);
?>
Result:
terminal$ php TESTINGSTRIPZ.php
YOUR RESULT IS: "quoted text" // < with no hidden char.
This global funtion resolve for UTF-8 system base charset. Tanks!
function prepareCharset($str) {
// set default encode
mb_internal_encoding('UTF-8');
// pre filter
if (empty($str)) {
return $str;
}
// get charset
$charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII'));
if (stristr($charset, 'utf') || stristr($charset, 'iso')) {
$str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str));
} else {
$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
}
// remove BOM
$str = urldecode(str_replace("%C2%81", '', urlencode($str)));
// prepare string
return $str;
}