问题
I have a string $data, encoded in utf-8. I assume that I don't know whether this string is utf-8 or iso-8859-1. I want to use the Perl Encode::Guess module to see if it's one or the other. I'm having trouble figuring out how this module works.
I have tried the four following methods (from http://perldoc.perl.org/Encode/Guess.html) :
use Encode::Guess qw/utf8 latin1/;
my $decoder = guess_encoding($data);
print "$decoder\n";
Result: iso-8859-1 or utf8
use Encode::Guess qw/utf8 latin1/;
my $enc = guess_encoding($data, qw/utf8 latin1/);
ref($enc) or die "Can't guess: $enc";
my $utf8 = $enc->decode($data);
print "$utf8\n";
Result: Can't guess: iso-8859-1 or utf8 at encodage-windows.pl line 25, line 18110.
use Encode::Guess qw/utf8 latin1/;
my $decoder = Encode::Guess->guess($data);
die $decoder unless ref($decoder);
my $utf8 = $decoder->decode($data);
print "$utf8\n";
Result: iso-8859-1 or utf8 at encodage-windows.pl line 30, line 18110.
use Encode::Guess qw/utf8 latin1/;
my $utf8 = Encode::decode("Guess", $data);
print "$utf8\n";
Result: iso-8859-1 or utf8 at /usr/local/lib/perl5/Encode.pm line 175.
My first question is: which one of these methods am I supposed to use (if any)? And my second question: what changes should I make to make this work?
回答1:
I normally check the possible encodings one at a time, like this
my $decoder = guess_encoding($data, 'utf8');
$decoder = guess_encoding($data, 'iso-8859-1') unless ref $decoder;
die $decoder unless ref $decoder;
printf "Decoding as %s\n\n", $decoder->name;
$data = $decoder->decode($data);
If possible it chooses UTF-8, otherwise it tries ISO-8859-1, and either chooses that or errors, so it becomes a simple yes/no result for each encoding and there is no way for it to come up with two possible results (which is the error you're getting).
来源:https://stackoverflow.com/questions/23015155/can-encodeguess-tell-utf-8-from-iso-8859-1