How can I open a Unicode file with Perl?

后端 未结 4 1321
广开言路
广开言路 2021-01-04 17:58

I\'m using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn\'t

相关标签:
4条回答
  • 2021-01-04 18:33

    The file is presumably in UCS2-LE (or UTF-16 format).

    C:\Temp> notepad test.txt
    
    C:\Temp> xxd test.txt
    0000000: fffe 5400 6800 6900 7300 2000 6900 7300  ..T.h.i.s. .i.s.
    0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00   .a. .f.i.l.e...

    When opening such file for reading, you need to specify the encoding:

    #!/usr/bin/perl
    
    use strict; use warnings;
    
    my ($infile) = @ARGV;
    
    open my $in, '<:encoding(UCS-2le)', $infile
        or die "Cannot open '$infile': $!";
    

    Note that the fffe at the beginning is the BOM.

    0 讨论(0)
  • 2021-01-04 18:34

    The answer is in the documentation for open, which also points you to perluniintro. :)

    open my $fh, '<:encoding(UTF-16LE)', $file or die ...;
    

    You can get a list of the names of the encodings that your perl supports:

    % perl -MEncode -le "print for Encode->encodings(':all')"
    

    After that, it's up to you to find out what the file encoding is. This is the same way you'd open any file with an encoding different than the default, whether it's one defined by Unicode or not.

    We have a chapter in Effective Perl Programming that goes through the details.

    0 讨论(0)
  • 2021-01-04 18:36
        #
        # -----------------------------------------------------------------------------
        # Reads a file returns a sting , if second param is utf8 returns utf8 string
        # usage:
        # ( $ret , $msg , $str_file )
        #         = $objFileHandler->doReadFileReturnString ( $file , 'utf8' ) ;
        # or
        # ( $ret , $msg , $str_file )
        #         = $objFileHandler->doReadFileReturnString ( $file ) ;
        # -----------------------------------------------------------------------------
        sub doReadFileReturnString {
    
            my $self      = shift;
            my $file      = shift;
            my $mode      = shift ;
    
            my $msg        = {} ;
            my $ret        = 1 ;
            my $s          = q{} ;
    
            $msg = " the file : $file does not exist !!!" ;
            cluck ( $msg ) unless -e $file ;
    
            $msg = " the file : $file is not actually a file !!!" ;
            cluck ( $msg ) unless -f $file ;
    
            $msg = " the file : $file is not readable !!!" ;
            cluck ( $msg ) unless -r $file ;
    
            $msg .= "can not read the file $file !!!";
    
            return ( $ret , "$msg ::: $! !!!" , undef )
                unless ((-e $file) && (-f $file) && (-r $file));
    
            $msg = '' ;
    
            $s = eval {
                 my $string = ();    #slurp the file
                 {
                    local $/ = undef;
    
                    if ( defined ( $mode ) && $mode eq 'utf8' ) {
                        open FILE, "<:utf8", "$file "
                          or cluck("failed to open \$file $file : $!");
                        $string = <FILE> ;
                        die "did not find utf8 string in file: $file"
                            unless utf8::valid ( $string ) ;
                    }
                    else {
                        open FILE, "$file "
                          or cluck "failed to open \$file $file : $!" ;
                        $string = <FILE> ;
                    }
                    close FILE;
    
                 }
                $string ;
             };
    
             if ( $@ ) {
                $msg = $! . " " . $@ ;
                $ret = 1 ;
                $s = undef ;
             } else {
                $ret = 0 ; $msg = "ok for read file: $file" ;
             }
             return ( $ret , $msg , $s ) ;
        }
        #eof sub doReadFileReturnString
    
    0 讨论(0)
  • 2021-01-04 18:52

    Try opening the file with an IO layer specified, e.g. :

    open OUTPUT,  "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";
    

    See perldoc open for more on this.

    0 讨论(0)
提交回复
热议问题