I usually loop through lines in a file using the following code:
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
...
}
However, in answering another question, Evan Carroll edited my answer, changing my while
statement to:
while ( defined( my $line = <$fh> ) ) {
...
}
His rationale was that if you have a line that's 0
(it'd have to be the last line, else it would have a carriage return) then your while
would exit prematurely if you used my statement ($line
would be set to "0"
, and the return value from the assignment would thus also be "0"
which gets evaluated to false). If you check for defined-ness, then you don't run into this problem. It makes perfect sense.
So I tried it. I created a textfile whose last line is 0
with no carriage return on it. I ran it through my loop and the loop did not exit prematurely.
I then thought, "Aha, maybe the value isn't actually 0
, maybe there's something else there that's screwing things up!" So I used Dump()
from Devel::Peek
and this is what it gave me:
SV = PV(0x635088) at 0x92f0e8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0X962600 "0"\0
CUR = 1
LEN = 80
That seems to tell me that the value is actually the string "0"
, as I get a similar result if I call Dump()
on a scalar I've explicitly set to "0"
(the only difference is in the LEN field -- from the file LEN is 80, whereas from the scalar LEN is 8).
So what's the deal? Why doesn't my while()
loop exit prematurely if I pass it a line that's only "0"
with no carriage return? Is Evan's loop actually more defensive, or does Perl do something crazy internally that means you don't need to worry about these things and while()
actually only does exit when you hit eof
?
Because
while (my $line = <$fh>) { ... }
actually compiles down to
while (defined( my $line = <$fh> ) ) { ... }
It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:
>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
...
}
^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
do {
die 'Unimplemented'
};
}
- syntax OK
So you're already good to go!
BTW, this is covered in the I/O Operators section of perldoc perlop:
In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.
Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.
The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; } while ($_ = <STDIN>) { print; } while (<STDIN>) { print; } for (;<STDIN>;) { print; } print while defined($_ = <STDIN>); print while ($_ = <STDIN>); print while <STDIN>;
This also behaves similarly, but avoids $_ :
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... } while (<STDIN>) { last unless $_; ... }
In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.
While it is correct that the form of while (my $line=<$fh>) { ... }
gets compiled to while (defined( my $line = <$fh> ) ) { ... }
consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined
in the loop or testing the return of <>
.
Here are several examples:
#!/usr/bin/perl
use strict; use warnings;
my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);
open $fh, '<', \$str or
die "could not open in-memory file: $!";
print "$sep Should print:\n$str\n$sep\n";
#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
"\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";
#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();
#Failure 3:
# fails on last line of "0"
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" }
print "$sep\n";
last_char();
#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();
#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) {
print $line;
} else {
print "READ ERROR: That was supposed to be the last line!\n";
}
print "BUT, line read really was: \"$line\"", "\n\n";
sub chomp_ln {
# if I have "warnings", Perl says:
# Value of <HANDLE> construct can be "0"; test with defined()
if($line=<$fh>) {
chomp $line ;
return $line;
}
return undef;
}
sub trim_ln {
# if I have "warnings", Perl says:
# Value of <HANDLE> construct can be "0"; test with defined()
if (my $line=<$fh>) {
$line =~ s/^\s+//;
$line =~ s/\s+$//;
return $line;
}
return undef;
}
sub rewind {
seek ($fh, 0, 0) or
die "Cannot seek on in-memory file: $!";
}
sub last_char {
seek($fh, -1, 2) or
die "Cannot seek on in-memory file: $!";
}
I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...
来源:https://stackoverflow.com/questions/3773917/whats-the-most-defensive-way-to-loop-through-lines-in-a-file-with-perl