Using `awk` to print number of lines in file in the BEGIN section

问题

I am trying to write an awk script and before anything is done tell the user how many lines are in the file. I know how to do this in the END section but unable to do so in the BEGIN section. I have searched SE and Google but have only found a half dozen ways to do this in the END section or as part of a bash script, not how to do it before any processing has taken place at all. I was hoping for something like the following:

#!/usr/bin/awk -f

BEGIN{
        print "There are a total of " **TOTAL LINES** " lines in this file.\n"
     }
{

        if($0==4587){print "Found record on line number "NR; exit 0;}
}

But have been unable to determine how to do this, if it is even possible. Thanks.

回答1:

You can read the file twice.

awk 'NR!=1 && FNR==1 {print NR-1} <some more code here>' file{,}

In you example:

awk 'NR!=1 && FNR==1 {print "There are a total of "NR-1" lines in this file.\n"} $0==4587 {print "Found record on line number "NR; exit 0;}' file{,}

You can use file file instead of file{,} (its just make it show up twice.)
NR!=1 && FNR==1 this will be true only at first line of second file.

To use an awk script

#!/usr/bin/awk -f
NR!=1 && FNR==1 {
    print "There are a total of "NR-1" lines in this file.\n"
    } 
$0==4587 {
    print "Found record on line number "NR; exit 0
    }

awk -f myscript file{,}

回答2:

To do this robustly and for multiple files you need something like:

$ cat tst.awk
BEGINFILE {
    numLines = 0
    while ( (getline line < FILENAME) > 0 ) {
        numLines++
    }
    print "----\nThere are a total of", numLines, "lines in", FILENAME
}
$0==4587 { print "Found record on line number", FNR, "of", FILENAME; nextfile }
$
$ cat file1
a
4587
c
$
$ cat file2
$
$ cat file3
d
e
f
4587
$
$ awk -f tst.awk file1 file2 file3
----
There are a total of 3 lines in file1
Found record on line number 2 of file1
----
There are a total of 0 lines in file2
----
There are a total of 4 lines in file3
Found record on line number 4 of file3

The above uses GNU awk for BEGINFILE. Any other solution is difficult to implement such that it will handle empty files (you need an array to track files being parsed and print info the the FNR==1 and END sections after the empty file has been skipped).

Using getline has caveats and should not be used lightly, see http://awk.info/?tip/getline, but this is one of the appropriate and robust uses of it. You can also test for non-readable files in BEGINFILE by testing ERRNO and skipping the file (see the gawk manual) - that situation will cause other scripts to abort.

回答3:

BEGIN {
s="cat your_file.txt|wc -l"; 
s | getline file_size;
close(s);
print file_size 
}

This will put the size of the file named your_file.txt into the awk variable file_size and print it out.

If your file name is dynamic you can pass the filename on the commandline and change the script to use the variable.

E.g. my.awk

BEGIN {
s="cat "VAR"|wc -l"; 
s | getline file_size;
close(s);
print file_size 
}

Then you can call it like this: awk -v VAR="your_file.txt" -f my.awk

回答4:

If you use GNU awk and need a robust, generic solution that accommodates multiple, possibly empty input files, use Ed Morton's solution.

This answer uses portable (POSIX-compliant) code. Within the constraints noted, it is robust, but Ed's GNU awk solution is both simpler and more robust.
Tip of the hat to Ed Morton for his help.

With a single input file, it is simpler to handle line counting with a shell command in the BEGIN block, which has the following advantages:

on invocation, the filename doesn't have to be specified twice, unlike in the accepted answer
- Also note that the accepted answer doesn't work as intended (as of this writing); the correct form is (see the comments on the answer for an explanation):
  - awk 'NR==FNR {next} FNR==1 {print NR-1} $0==4587 {print "Found record on line number "NR; exit 0}' file{,}
the solution also works with an empty input file.

In terms of performance, this approach is either only slightly slower than reading the file twice in awk, or even a little faster, depending on the awk implementation used:

awk '
  BEGIN {
     # Execute a shell command to count the lines and read
     # result into an awk variable via <cmd> | getline <varname>.
     # If the file cannot be read, abort. (The shell has already printed an error msg.)
    cmd="wc -l < \"" ARGV[1] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
    printf "There are a total of %s lines in this file.\n\n", count
  }
  $0==4587 { print "Found record on line number " NR; exit 0 }
' file

Assumptions:

The filename is passed as the 1st operand (non-option argument) on the command line, accessed as ARGV[1].
The filename doesn't contain embedded " chars.

The following solutions deal with multiple files and make analogous assumptions:

All operands passed are filenames. That is, all arguments after the program must be filenames, and not variable assignments such as var=value.
No filename contains embedded " chars.
No processing is to take place if any of the input files do not exist or cannot be read.

It's not hard to generalize this to handling multiple files, but the following solution doesn't print the line count for empty files:

awk '
  BEGIN {
     # Loop over all input files and store their line counts in an array.
    for (i=1; i<ARGC; ++i) {
      cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
      counts[ARGV[i]] = count
    }
  }
   # At the beginning of every (non-empty) file, print the line count.
  FNR==1 { printf "There are a total of %s lines in file %s.\n\n", counts[FILENAME], FILENAME }
  # $0==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
' file1 file2 # ...

Things get a little trickier if you want the line count to be printed for empty files also:

awk '
  BEGIN {
     # Loop over all input files and store their line counts in an array.
    for (i=1; i<ARGC; ++i) {
      cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
      counts[ARGV[i]] = count
    }
    fileCount = ARGC - 1
    fmtStringCount = "There are a total of %s lines in file %s.\n\n"
  }
   # At the beginning of every (non-empty) file, print the line count.
  FNR==1 {
   ++fileIndex
    # If there were intervening empty files, print their counts too.
   while (ARGV[fileIndex] != FILENAME) {
       printf fmtStringCount, 0, ARGV[fileIndex++]
   }
   printf fmtStringCount, counts[FILENAME], FILENAME
  }
   # Process input lines
  $0==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
   # If there are any remaining empty files a the end, print their counts too.
  END {
    while (fileIndex < fileCount) { printf fmtStringCount, 0, ARGV[++fileIndex] }
  }
' file1 file2 # ...

来源：https://stackoverflow.com/questions/29314555/using-awk-to-print-number-of-lines-in-file-in-the-begin-section

标签

text

awk

text-processing