PHP fread() Function Returning Extra Characters at the Front on UTF-8 Text Files

浪子不回头ぞ 提交于 2019-12-24 13:12:21

问题


While I'm using fread() on a normal text file (for example: ANSI file saved normally with Notepad), the returned content string is correct, as everyone knows.

But when I read the UTF-8 text file, the returning content string contains invisible characters (at the front). Why I said invisible is that the extra characters can't be seen normally on output (e.g.. echo for just read). But when the content string is used for processing (for example: Build a link with href value), problem is arisen then.

$filename = "blabla.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
echo '<a href="'.$contents.'">'.$contents.'</a>';

I put only http://www.google.com in the UTF-8 encoding text file. While running the PHP file, you will see a output link http://www.google.com
.. but you will never reach to Google.

Because address source href is being like this:

%EF%BB%BFhttp://www.google.com

It means, fread added %EF%BB%BF weird characters at the front.

This is extra annoying stuff. Why it is happening?

Added:
Some pointing that is BOM. So, BOM or whatever, it is changing my original values. So now, it is problem with other steps, function calls, etc. Now I have to substr($string,3) for all outputs. This is totally non-sense changing the original values.


回答1:


This is called the UTF-8 BOM. Please refer to http://en.wikipedia.org/wiki/Byte_order_mark

It is something that is optionally added to the beginnning of Utf-8 files, meaning it is in the file, and not something fread adds. Most text editors won't display the BOM, but some will -- mostly those that don't understand it. Not all editors will add it to Utf-8 files, but yet again, some will...

For Utf-8 the usage of BOM is not recommended, as it has no meaning and by many instances are not understood.




回答2:


It is UTF-8 BOM. IF you look at the docs for fread(here) someone has discussed a solution for it.

The solution given over there is the following

// Reads past the UTF-8 bom if it is there.
function fopen_utf8 ($filename, $mode) {
    $file = @fopen($filename, $mode);
    $bom = fread($file, 3);
    if ($bom != b"\xEF\xBB\xBF")
        rewind($file, 0);
    else
        echo "bom found!\n";
    return $file;
} 


来源:https://stackoverflow.com/questions/9126423/php-fread-function-returning-extra-characters-at-the-front-on-utf-8-text-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!