How can I treat command-line arguments as UTF-8 in Perl?

后端 未结 5 668
醉酒成梦
醉酒成梦 2020-12-03 08:12

How do I treat the elements of @ARGV as UTF-8 in Perl?

Currently I\'m using the following work-around ..

use Encode qw(decode encode);

         


        
相关标签:
5条回答
  • 2020-12-03 08:16

    The way you've done it seems correct. That's what I would do.

    However, this perldoc page suggests that the command line flag -CA should tell it to treat @ARGV as utf-8. (not tested).

    0 讨论(0)
  • 2020-12-03 08:18

    Outside data sources are tricky in Perl. For command-line arguments, you're probably getting them as the encoding specified in your locale. Don't rely on your locale to be the same as someone else who might run your program.

    You have to find out what that is then convert to Perl's internal format. Fortunately, it's not that hard.

    The I18N::Langinfo module has the stuff you need to get the encoding:

        use I18N::Langinfo qw(langinfo CODESET);
        my $codeset = langinfo(CODESET);
    

    Once you know the encoding, you can decode them to Perl strings:

        use Encode qw(decode);
        @ARGV = map { decode $codeset, $_ } @ARGV;
    

    Although Perl encodes internal strings as UTF-8, you shouldn't ever think or know about that. You just decode whatever you get, which turns it into Perl's internal representation for you. Trust that Perl will handle everything else. When you need to store the data, ensure that you use the encoding you like.

    If you know that your setup is UTF-8 and the terminal will give you the command-line arguments as UTF-8, you can use the A option with Perl's -C switch. This tells your program to assume the arguments are encoded as UTF-8:

    % perl -CA program
    

    You also get that with just -C, which turns on several other Unicode options:

    % perl -C program
    

    I find "if you know" to be a big red flag that really means "we're not sure", however.

    0 讨论(0)
  • 2020-12-03 08:26

    For example for windows set code

    chcp 1251
    

    in perl:

    use utf8;
    use Modern::Perl;
    use Encode::Locale qw(decode_argv);
    
     if (-t)
    {
        binmode(STDIN, ":encoding(console_in)");
        binmode(STDOUT, ":encoding(console_out)");
        binmode(STDERR, ":encoding(console_out)");
    }
    
    Encode::Locale::decode_argv();
    

    in command line

    perl -C ppixregexplain.pl qr/\bмама\b/i > ex1.html 2>&1  
    

    where ppixregexplain.pl

    0 讨论(0)
  • 2020-12-03 08:27

    Use Encode::Locale:

    use Encode::Locale;
    
    decode_argv Encode::FB_CROAK;
    

    This works, also on Win32, pretty OK for me.

    0 讨论(0)
  • 2020-12-03 08:30

    You shouldn't have to do anything special to the string. Perl strings are in UTF-8 by default starting with Perl 5.8.

    perl -CO -le 'print "\x{2603}"' | xargs perl -le 'print "I saw @ARGV"'
    

    The code above works just fine on Ubuntu 9.04, OS X 10.6, and FreeBSD 7.

    FalseVinylShrub brings up a good point, We can see a definite difference between

    perl -Mutf8 -wle ';print utf8::is_utf8($ARGV[0]) ? "t" : "f"' a
    

    and

    perl -Mutf8 -CA -wle ';print utf8::is_utf8($ARGV[0]) ? "t" : "f"' a
    
    0 讨论(0)
提交回复
热议问题