Remove BOM from string with Perl

时间秒杀一切 提交于 2019-11-30 17:31:23

EF BB BF is the UTF-8 encoding of the BOM, but you decoded it, so you must look for its decoded form. The BOM is a ZERO WIDTH NO-BREAK SPACE (U+FEFF) used at the start of a file, so any of the following will do:

s/^\x{FEFF}//;
s/^\N{U+FEFF}//;
s/^\N{ZERO WIDTH NO-BREAK SPACE}//;
s/^\N{BOM}//;   # Convenient alias

I understand the "wide character" which I am being warned about is the BOM. I want to get rid of it

You're getting wide character because you forgot to add an :encoding layer on your output file handle. The following adds :encoding(UTF-8) to STDIN, STDOUT, STDERR, and makes it the default for open().

use open ':std', ':encoding(UTF-8)';

To defuse the BOM, you have to know it's not 3 characters, it's 1 in UTF (U+FEFF):

s/^\x{FEFF}//;
Pierre

If you open the file using File::BOM, it will remove the BOM for you.

use File::BOM;

open_bom(my $fh, $path, ':utf8')

Ideally, your filehandle should be doing this for you automatically. But if you're not in an ideal situation, this worked for me:

use Encode;

my $value = decode('UTF-8', $originalvalue);
$value =~ s/\N{U+FEFF}//;
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!