XML::Simple encoding problem

风格不统一 提交于 2019-12-11 06:45:51

问题


I have an xml-file I want to parse:

<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>

It's perfectly parsed by firefox. But XML::Simple corrupts some data. I have a perl-program like this:

my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";
$content .= "<tag>\x{c3}\x{bb}</tag>\n";

print "input:\n$content\n";

my $xml = new XML::Simple;
my $data = $xml->XMLin($content, KeepRoot => 1);

print "data:\n";
print Dumper $data;

and get:

input:
<?xml version="1.0" encoding="UTF-8" ?>
<tag>û</tag>

data:
$VAR1 = {
          'tag' => "\x{fb}"
        };

it doesn't seem to be what I expected. I think there some encoding issues. Am I doing something wrong?

UPD: I thought that XMLin returned text in utf-8 (as the input). Just added

encode_utf8($data->{'tag'});

and it worked


回答1:


XML::Simple is fickle.

Its calling Encode::decode('UTF-8',$content) which is putting your UTF-8 in native.

Do this:

my $content_utf8 = "whatevér";
my $xml = XMLin($content_utf8);
my $item_utf8 = Encode::encode('UTF-8',$xml->{'item'});

This sort of works too, but risky w/ double encoding:

my $content_utf8 = "whatevér";
my $double_encoded_utf8 = Encode::encode('UTF-8',$content_utf8);
my $xml = XMLin($double_encoded_utf8);
my $item_utf8 = $xml->{'item'};



回答2:


Hexadecimal FB (dec 251) is ASCII code of "û" character. Could you please elaborate on what you expected to get in the data structure which leads you to conclude what you got was "corrupt"?



来源:https://stackoverflow.com/questions/4004214/xmlsimple-encoding-problem

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!