cURL gets response with utf-8 BOM

会有一股神秘感。 提交于 2019-12-06 06:14:00

I'm afraid you already found the answer by yourself - it's bad news in that there is no better answer that I know of.

The BOM should not be there, and it's the sender's responsibility to not send it along.

But I can reassure you, the BOM is either there or there is not, and if it is, it's those three bytes you know.

You can have a slightly faster and handle another N BOMs with a small alteration:

$__BOM = pack('CCC', 239, 187, 191);
// Careful about the three ='s -- they're all needed.
while(0 === strpos($data, $__BOM))
    $data = substr($data, 3);

A third-party BOM detector wouldn't do any different. This way you're covered even if at a later time cURL began stripping unneeded BOMs.

Possible causes

Some JSON optimizers and filters may decide the output requires a BOM. Also, perhaps more simply, whoever wrote the script generating the JSON inadvertently included a BOM before the opening PHP tag. Apache, not caring what the BOM is, sees there is data before the opening tag, so sends it along and hides it from the PHP stream itself. This can occasionally also cause the "Cannot add headers: output already started" error.

Content detection

You can verify the JSON is valid UTF-8, BOM or not BOM, but need mb_string support and you must use strict mode to get some edge cases:

if (false === mb_detect_encoding($data, 'UTF-8', true)) {
    // JSON contains invalid sequences (erroneously NOT JSON encoded)
}

I would advise against trying to correct a possible encoding error; you risk breaking your own code, and also having to maintain someone else's work.

sapht

This page details a similar issue: BOM in a PHP page auto generated by Wordpress

Basically, it can occur when the JSON generator is written in PHP and an editor has somehow snuck in the BOM before the opening <?php tag. Since your client language is PHP I'm assuming this is relevant.

You could strip it out using the substr comparison -- a BOM only ever occurs at the start of a document. But if you have control over the JSON source, you should remove the BOM from the source document instead.

Paul Moldovan

There will never be more than 3 characters before the "{". Those 3 characters are one character in UTF-8. So if you just do $data = substr($data, 3); you will be fine.

Take a look here for more information: json_decode returns NULL after webservice call

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!