Perl, disable buffering input

问题

There is a file:

:~$ cat fff
qwerty
asdf
qwerty
zxcvb

There is a script:

:~$ cat 1.pl
#!/usr/bin/perl
print <STDIN>

The command works as expected:

:~$ cat fff | perl -e 'system("./1.pl")'
qwerty
asdf
qwerty
zxcvb

But this command will not work as expected: the first <STDIN> reads all the data, not a single line. How to disable buffering for <STDIN>?

:~$ cat fff | perl -e '$_ = <STDIN>; system("./1.pl")'
:~$

回答1:

There are two Perl processes here - the first that assigns $_ = <STDIN> and calls system, and the second that does print <STDIN>

Although only the first line of the stream is read into $_ by the first process, behind the scenes Perl has filled its buffer with data and left the stream empty

What is the purpose of this? The only way that comes to mind to do what you ask is to read all of the file into an array in the first process, and then remove the first line and send the rest in a pipe to the second script

All of this seems unnecessary, and I am sure there is a better method if you will describe the underlying problem

Update

Since you say you are aware of the buffering problem, the way to do this is to use sysread, which will read from the pipe at a lower level and avoid the buffering

Something like this will work

cat fff | perl -e 'while (sysread(STDIN, $c, 1)) {$_ .= $c; last if $c eq "\n"} system("./1.pl")'

But I don't like recommending it as what you are doing seems very wrong and I wish you would explain your real goal

回答2:

I've recently had to parse several log files which were around 6 gigabytes each. The buffering was a problem since Perl would happily attempt to read those 6 gigabytes into memory when I would assign the STDIN to an array... However, I simply didn't have the available system resources to do that. I came up with the following workaround that simply reads the file line by line and, thus, avoids the massive memory blackhole buffering vortex that would otherwise commandeer all my system resources.

note: All this script does is split that 6 gigabyte file into several smaller ones(of which the size is dictated by the number of lines to be contained in each output file). The interesting bit is the while loop and the assignment of a single line from the log file to the variable. The loop will iterate through the entire file reading a single line, doing something with it, and then repeating. Result, no massive buffering... I kept the entire script intact just to show a working example...

#!/usr/bin/perl -w
BEGIN{$ENV{'POSIXLY_CORRECT'} = 1;}
use v5.14;
use Getopt::Long qw(:config no_ignore_case);

my $input = '';
my $output = '';
my $lines = 0;
GetOptions('i=s' => \$input, 'o=s' => \$output, 'l=i' => \$lines);

open FI, '<', $input;

my $count = 0;
my $count_file = 1;
while($count < $lines){
    my $line = <FI>; #assign a single line of input to a variable
    last unless defined($line);
    open FO, '>>', "$output\_$count_file\.log";
    print FO $line;
    $count++;
    if($count == $lines){
        $count=0;
        $count_file++;
    }
}
print " done\n";

Script is invoked on the command line like:

(name of script) -i (input file) -o (output file) -l (size of output file(i.e. number of lines)

Even if its not exactly what you are looking for, I hope it will give you some ideas. :)

来源：https://stackoverflow.com/questions/12351500/perl-disable-buffering-input

标签

perl

buffering