Is it possible to have different behavior for first and second input files to awk?

风格不统一 提交于 2020-01-15 07:40:10

问题


For example, suppose I run the following command:

gawk -f AppendMapping.awk Reference.tsv TrueInput.tsv

Assume the names of files WILL change. While iterating through the first file, I want to create a mapping.

map[$16]=$18

While iterating through the second file, I want to use the mapping.

print $1, map[$2]

What's the best way to achieve this behavior (ie, different behavior for each input file)?


回答1:


As you probably know NR stores the current line number; as you may or may not know, it's cumulative - it doesn't get reset between files. FNR, on the other hand, is specific to the file, so you can use those two to see whether you're in the first file (beyond the second you'll need to keep your own counter).

# In case you want to keep track of the file number
FNR == 1 { fileno++ }*emphasized text*

NR == FNR {
    # First file
}
NR != FNR {
    # Second or later file
}

You could also use getline in the BEGIN block to loop through it manually.

BEGIN {
    file = ARGV[1]
    while(getline < file) {
        # Process line
    }
    delete ARGV[1]
}



回答2:


Gawk versions 4 and high offer the special BEGINFILE (and ENDFILE) block as well as the usual BEGIN and END blocks. Use them to set flags on which you vary the behavior of your code.

Recall that patterns can include comparisons with variables, so you can select patters directly on the value of your flags.

The man page says:

For each input file, if a BEGINFILE rule exists, gawk executes the associated code before processing the contents of the file. Similarly, gawk executes the code associated with ENDFILE after processing the file.




回答3:


This might work for you:

seq 5 >/tmp/a
seq 100 105 >/tmp/b
awk 'FILENAME==ARGV[1]{print FILENAME,$0};FILENAME==ARGV[2]{print $0,FILENAME}' /tmp/{a,b}
/tmp/a 1
/tmp/a 2
/tmp/a 3
/tmp/a 4
/tmp/a 5
100 /tmp/b
101 /tmp/b
102 /tmp/b
103 /tmp/b
104 /tmp/b
105 /tmp/b

So by combining FILENAME with ARGV[n] where n is the nth file on the command line, awk can conditionally change individual files.

N.B. ARGV[0] would be the awk command.



来源:https://stackoverflow.com/questions/10691080/is-it-possible-to-have-different-behavior-for-first-and-second-input-files-to-aw

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!