I am working on a Perl script to read CSV file and do some calculations. CSV file has only two columns, something like below.
One Two
1.00 44.000
3.00 55.000
The File::ReadBackwards module allows you to read a file in reverse order. This makes it easy to get the last N lines as long as you aren't order dependent. If you are and the needed data is small enough (which it should be in your case) you could read the last 1000 lines into an array and then reverse
it.
In *nix, you can use the tail command.
tail -1000 yourfile | perl ...
That will write only the last 1000 lines to the perl program.
On Windows, there are gnuwin32 and unxutils packages both have tail
utility.
Without tail, a Perl-only solution isn't that unreasonable.
One way is to seek from the end of the file, then read lines from it. If you don't have enough lines, seek even further from the end and try again.
sub last_x_lines {
my ($filename, $lineswanted) = @_;
my ($line, $filesize, $seekpos, $numread, @lines);
open F, $filename or die "Can't read $filename: $!\n";
$filesize = -s $filename;
$seekpos = 50 * $lineswanted;
$numread = 0;
while ($numread < $lineswanted) {
@lines = ();
$numread = 0;
seek(F, $filesize - $seekpos, 0);
<F> if $seekpos < $filesize; # Discard probably fragmentary line
while (defined($line = <F>)) {
push @lines, $line;
shift @lines if ++$numread > $lineswanted;
}
if ($numread < $lineswanted) {
# We didn't get enough lines. Double the amount of space to read from next time.
if ($seekpos >= $filesize) {
die "There aren't even $lineswanted lines in $filename - I got $numread\n";
}
$seekpos *= 2;
$seekpos = $filesize if $seekpos >= $filesize;
}
}
close F;
return @lines;
}
P.S. A better title would be something like "Reading lines from the end of a large file in Perl".
If you know the number of lines in the file, you can do
perl -ne "print if ($. > N);" filename.csv
where N is $num_lines_in_file - $num_lines_to_print. You can count the lines with
perl -e "while (<>) {} print $.;" filename.csv
The modules are the way to go. However, sometimes you may be writing a piece of code that you want to run on a variety of machines that may be missing the more obscure CPAN modules. In that case why not just 'tail' and dump the output to a temp file from within Perl?
#!/usr/bin/perl
`tail --lines=1000 /path/myfile.txt > tempfile.txt`
You then have something that isn't dependent on a CPAN module if installing one may present an issue.
This is only tangentially related to your main question, but when you want to check if a module such as File::Tail works on your platform, check the results from CPAN Testers. The links at the top of the module page in CPAN Search lead you to
Looking at the matrix, you see that indeed this module has a problem on Windows on all version of Perl tested: