Is Perl faster than bash?

前端 未结 10 1930
刺人心
刺人心 2020-12-15 09:26

I have a bash script that cuts out a section of a logfile between 2 timestamps, but because of the size of the files, it takes quite a while to run.

If I were to rew

相关标签:
10条回答
  • 2020-12-15 09:58

    Updated script based on Brent's comment: This one is untested.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my %months = (
        jan => 1, feb => 2,  mar => 3,  apr => 4,
        may => 5, jun => 6,  jul => 7,  aug => 8,
        sep => 9, oct => 10, nov => 11, dec => 12,
    );
    
    while ( my $line = <> ) {
        my $ts = substr $line, 0, 15;
        next if parse_date($ts) lt '0201100543';
        last if parse_date($ts) gt '0715123456';
        print $line;
    }
    
    sub parse_date {
        my ($month, $day, $time) = split ' ', $_[0];
        my ($hour, $min, $sec) = split /:/, $time;
        return sprintf(
            '%2.2d%2.2d%2.2d%2.2d%2.2d',
            $months{lc $month}, $day,
            $hour, $min, $sec,
        );
    }
    
    
    __END__
    

    Previous answer for reference: What is the format of the file? Here is a short script which assumes the first column is a timestamp and prints only lines that have timestamps in a certain range. It also assumes that the timestamps are sorted. On my system, it took about a second to filter 900,000 lines out of a million:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    while ( <> ) {
        my ($ts) = split;
        next if $ts < 1247672719;
        last if $ts > 1252172093;
        print $ts, "\n";
    }
    
    __END__
    
    0 讨论(0)
  • 2020-12-15 09:58

    Perl is absurdly faster than Bash. And, for text manipulation, you can actually achieve better performances with Perl than with C, unless you take time to write complex algorithms. Of course, for simple stuff C can be unbeatable.

    That said, if your "bash" script is not looping, just calling other programs, then there isn't any gain to be had. For example, if your script looks like "cat X | grep Y | tr -f 3-5 | sort | uniq", then most of the time is spent on cat, grep, tr, sort and uniq, NOT on Bash.

    You'll gain performance if there is any loop in the script, or if you save multiple reads of the same file.

    You say you cut stuff between two timestamps on a file. Let's say your Bash script looks like this:

    LINE1=`grep -n TIMESTAMP1 filename | head -1 | cut -d ':' -f 1`
    LINE2=`grep -n TIMESTAMP2 filename | head -1 | cut -d ':' -f 1`
    tail +$LINE1 filename | head -$(($LINE2-$LINE1))
    

    Then you'll gain performance, because you are reading the whole file three times: once for each command where "filename" appears. In Perl, you would do something like this:

    my $state = 0;
    while(<>) {
      exit if /TIMESTAMP2/;
      print $_ if $state == 1;
      $state = 1 if /TIMESTAMP1/;
    }
    

    This will read the file only once and will also stop once you read TIMESTAMP2. Since you are processing multiple files, you'd use "last" or "break" instead of "exit", so that the script can continue to process the files.

    Anyway, seeing your script I'm positive you'll gain a lot by rewriting it in Perl. Notwithstanding the loops dealing with file names (whose speed WILL be improved, but is probably insignificant), for each file which is not fully inside or outside scope you do:

    1. Read the WHOLE file to count lines!
    2. Do multiple tails on the file
    3. Finish by "head" or "tail" the file once again

    Furthermore, head your tails. Each time you do that, some piece of code is reading that data. Some of those lines are being read up to 10 times or more!

    0 讨论(0)
  • 2020-12-15 10:04

    it depends on how your bash script is written. if you are not using awk to parse the log file, instead using bash's while read loop, then changing it to awk will improve the speed.

    0 讨论(0)
  • 2020-12-15 10:06

    I agree that moving from a bash-only script to Perl (or even awk if a perl environment is not readily available) could yield a speed benefit, assuming both are equally well written.

    However, if the extract was amenable to being formed by a bash script that creates parameters for and then calls grep with a regex then that could be faster than a 'pure' script.

    0 讨论(0)
提交回复
热议问题