How can I properly align UTF-8 strings with Perl's printf?

后端 未结 3 1159
囚心锁ツ
囚心锁ツ 2021-01-18 15:38

what is the right way to get here a beautiful output ( all lines the same indent )?

#!/usr/bin/env perl
use warnings;
use strict;
use DBI;

my $phone_book =          


        
相关标签:
3条回答
  • 2021-01-18 15:55

    You can't use Unicode with printf if you have code points that take 0 or 2 print columns instead of 1, which it appears you do.

    You need to use Unicode::GCString instead.

    Wrong way:

    printf "%-10.10s", our $string;
    

    Right way:

    use Unicode::GCString;
    
    my $gcstring = Unicode::GCString->new(our $string);
    my $colwidth = $gcstring->columns();
    if ($colwidth > 10) {
        print $gcstring->substr(0,10);
    } else {
        print " " x (10 - $colwidth);
        print $gcstring;
    }
    
    0 讨论(0)
  • 2021-01-18 15:56
        #!/usr/bin/env perl
    
        use warnings;
        use strict;
    
        use utf8; # This is to allow utf8 in this program file (as opposed to reading/writing from/to file handles)
    
        binmode( STDOUT, 'utf8:' ); # Allow output of UTF8 to STDOUT
    
        my @strings = ( 'Mühßig', 'Holler' ); # UTF8 in this file, works because of 'use utf8'
    
        foreach my $s (@strings) { printf( "%-15s %10s\n", $s, 'lined up' ); } # should line up nicely
    
        open( FILE, 'utf8file' ) || die("Failed to open file: $! $?");
    
        binmode( FILE, 'utf8:' );
    
        # Same as above, but on the file instead of STDIN
    
        while(<FILE>) { chomp;printf( "%-15s %10s\n", $_, 'lined up' ); }
    
        close( FILE );
    
        # This works too
        use Encode;
    
        open( FILE, 'utf8file' ) || die("Failed to open file: $! $?");
    
        while(<FILE>) {
                chomp;
                $_ = decode_utf8( $_ );
                printf( "%-15s %10s\n", $_, 'lined up' );
        }
    
        close( FILE );
    
    0 讨论(0)
  • 2021-01-18 16:01

    I haven't been able to reproduce it, but loosely speaking what seems to be happening is that it's a character encoding mismatch. Most likely your Perl source file has been saved in UTF-8 encoding. However you have not enabled use utf8; in the script. So it's interpreting each of the non-ASCII German characters as being two characters and setting the padding accordingly. But the terminal you're running on is also in UTF-8 mode so the characters print correctly. Try adding use warnings; and I'll bet you get a warning printed, and I would not be surprised if adding use utf8; actually fixes the problem.

    0 讨论(0)
提交回复
热议问题