How to remove %EF%BB%BF in a PHP string

依然范特西╮ 提交于 2019-12-01 06:37:46

问题


I am trying to use the Microsoft Bing API.

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));

The data returned has a ' ' character in the first character of the returned string. It is not a space, because I trimed it before returning the data.

The ' ' character turned out to be %EF%BB%BF.

I wonder why this happened, maybe a bug from Microsoft?

How can I remove this %EF%BB%BF in PHP?


回答1:


You could use substr to only get the rest without the UTF-8 BOM:

// if it’s binary UTF-8
$data = substr($data, 3);
// if it’s percent-encoded UTF-8
$data = substr($data, 9);



回答2:


You should not simply discard the BOM unless you're 100% sure that the stream will: (a) always be UTF-8, and (b) always have a UTF-8 BOM.

The reasons:

  1. In UTF-8, a BOM is optional - so if the service quits sending it at some future point you'll be throwing away the first three characters of your response instead.
  2. The whole purpose of the BOM is to identify unambiguously the type of UTF stream being interpreted UTF-8? -16? or -32?, and also to indicate the 'endian-ness' (byte order) of the encoded information. If you just throw it away you're assuming that you're always getting UTF-8; this may not be a very good assumption.
  3. Not all BOMs are 3-bytes long, only the UTF-8 one is three bytes. UTF-16 is two bytes, and UTF-32 is four bytes. So if the service switches to a wider UTF encoding in the future, your code will break.

I think a more appropriate way to handle this would be something like:

/* Detect the encoding, then convert from detected encoding to ASCII */
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "ASCII", $enc);



回答3:


$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));

if (substr($data, 0, 3) == "\xef\xbb\xbf") {
$data = substr($data, 3);
}




回答4:


It's a byte order mark (BOM), indicating the response is encoded as UTF-8. You can safely remove it, but you should parse the remainder as UTF-8.




回答5:


I had the same problem today, and fixed by ensuring the string was set to UTF-8:

http://php.net/manual/en/function.utf8-encode.php

$content = utf8_encode ( $content );




回答6:


To remove it from the beginning of the string (only):

$data = preg_replace('/^%EF%BB%BF/', '', $data);



回答7:


$data = str_replace('%EF%BB%BF', '', $data);

You probably shouldn't be using stripslashes -- unless the API returns blackslashed data (and 99.99% chance it doesn't), take that call out.



来源:https://stackoverflow.com/questions/4057742/how-to-remove-efbbbf-in-a-php-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!