问题
I have a plain text-file (.yml) that contains UTF-8 character sequences like this:
foo: "Dette er en \xC3\xB8 "
The problem lies in \xC3\xB8 - These are not "real" UTF-8 bytes, since they are saved in the text file as 8 actual characters: \ x C 3 \ x B 8
Is there a way to get these converted into the real 2-bytes UTF-8 sequence?
Any OS / Language / Shell-tool may be used :-)
/ Carsten
回答1:
Use this perl script to convert your file:
#!/usr/bin/perl
while (<STDIN>) {
$_ =~ s/\\x([0-9A-F][0-9A-F])/chr(hex($1))/eg;
print $_;
}
Let's assume you named a file with script as bogusutf
, then do the conversion with this command:
$ perl bogusutf <inputfile >outputfile
来源:https://stackoverflow.com/questions/12668670/convert-utf-8-character-sequence-to-real-utf-8-bytes