What's the most defensive way to loop through lines in a file with Perl?

前端 未结 3 1711
粉色の甜心
粉色の甜心 2021-01-01 22:14

I usually loop through lines in a file using the following code:

open my $fh, \'<\', $file or die \"Could not open file $file for reading: $!\\n\";
while          


        
相关标签:
3条回答
  • 2021-01-01 22:31

    Because

     while (my $line = <$fh>) { ... }
    

    actually compiles down to

     while (defined( my $line = <$fh> ) ) { ... }
    

    It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:

    >perl -MO=Deparse
    open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
    while ( my $line = <$fh> ) {
      ...
    }
    
    ^D
    die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
    while (defined(my $line = <$fh>)) {
        do {
            die 'Unimplemented'
        };
    }
    - syntax OK
    

    So you're already good to go!

    0 讨论(0)
  • 2021-01-01 22:38

    BTW, this is covered in the I/O Operators section of perldoc perlop:

    In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.

    Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.

    The following lines are equivalent:

    while (defined($_ = <STDIN>)) { print; }
    while ($_ = <STDIN>) { print; }
    while (<STDIN>) { print; }
    for (;<STDIN>;) { print; }
    print while defined($_ = <STDIN>);
    print while ($_ = <STDIN>);
    print while <STDIN>;
    

    This also behaves similarly, but avoids $_ :

    while (my $line = <STDIN>) { print $line }
    

    In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:

    while (($_ = <STDIN>) ne '0') { ... }
    while (<STDIN>) { last unless $_; ... }
    

    In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.

    0 讨论(0)
  • 2021-01-01 22:50

    While it is correct that the form of while (my $line=<$fh>) { ... } gets compiled to while (defined( my $line = <$fh> ) ) { ... } consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined in the loop or testing the return of <>.

    Here are several examples:

    #!/usr/bin/perl
    use strict; use warnings;
    
    my $str = join "", map { "$_\n" } -10..10;
    $str.="0";
    my $sep='=' x 10;
    my ($fh, $line);
    
    open $fh, '<', \$str or 
         die "could not open in-memory file: $!";
    
    print "$sep Should print:\n$str\n$sep\n";     
    
    #Failure 1:
    print 'while ($line=chomp_ln()) { print "$line\n"; }:',
          "\n";
    while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
    rewind();
    print "$sep\n";
    
    #Failure 2:
    print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
    while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
    print "$sep\n";
    last_char();
    
    #Failure 3:
    # fails on last line of "0" 
    print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
    if(my $l=<$fh>) { print "$l\n" } 
    print "$sep\n";
    last_char();
    
    #Failure 4 and no Perl warning:
    print 'print "$_\n" if <$fh>;',"\n";
    print "$_\n" if <$fh>; #fails to print;
    print "$sep\n";
    last_char();
    
    #Failure 5
    # fails on last line of "0" with no Perl warning
    print 'if($line=<$fh>) { print $line; }', "\n";
    if($line=<$fh>) { 
        print $line; 
    } else {
        print "READ ERROR: That was supposed to be the last line!\n";
    }    
    print "BUT, line read really was: \"$line\"", "\n\n";
    
    sub chomp_ln {
    # if I have "warnings", Perl says:
    #    Value of <HANDLE> construct can be "0"; test with defined() 
        if($line=<$fh>) {
            chomp $line ;
            return $line;
        }
        return undef;
    }
    
    sub trim_ln {
    # if I have "warnings", Perl says:
    #    Value of <HANDLE> construct can be "0"; test with defined() 
        if (my $line=<$fh>) {
            $line =~ s/^\s+//;
            $line =~ s/\s+$//;
            return $line;
        }
        return undef;
    
    }
    
    sub rewind {
        seek ($fh, 0, 0) or 
            die "Cannot seek on in-memory file: $!";
    }
    
    sub last_char {
        seek($fh, -1, 2) or
           die "Cannot seek on in-memory file: $!";
    }
    

    I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...

    0 讨论(0)
提交回复
热议问题