Help with awk, using a file to filter another one I have a main file:
...
17,466971 0,095185 17,562156 id 676
17,466971 0,096694 17,563665 id 677
17,466971 0,098
Here's one way using awk
:
awk 'FNR==NR { a[$NF]; next } !($NF in a)' other main
Results:
17,466971 0,095185 17,562156 id 676
17,466971 0,096694 17,563665 id 677
17,466971 0,09816 17,565131 id 678
17,466971 0,099625 17,566596 id 679
17,466971 0,101091 17,568062 id 680
17,466971 0,101793 17,568764 id 682
17,466971 0,10253 17,569501 id 683
38,166772 0,08125 38,248022 id 1572
38,166772 0,082545 38,249317 id 1573
38,233772 0,082113 38,315885 id 1575
38,299771 0,081412 38,381183 id 1576
38,299771 0,083627 38,383398 id 1578
38,299771 0,085093 38,384864 id 1579
38,299771 0,085094 38,384865 id 1581
Drop the exclamation mark to show the 'deleted' lines:
awk 'FNR==NR { a[$NF]; next } $NF in a' other main
Results:
17,466971 0,016175 17,483146 id 681
38,233772 0,005457 38,239229 id 1574
38,299771 0,006282 38,306053 id 1577
38,299771 0,008682 38,308453 id 1580
Alternatively, if you'd like two files, one containing values 'present' and the other containing values 'deleted', try:
awk 'FNR==NR { a[$NF]; next } { print > ($NF in a ? "deleted" : "present") }' other main
Explanation1:
FNR==NR { ... }
is a commonly used construct that returns true for only the first file in the arguments list. In this case, awk
will read the file 'other' first. When this file is being processed, the value in the last column ($NF
) is added to an array (which we have called a
). next
then skips processing the rest of our code. Once the first file has been read, FNR
will no longer be equal to NR
, thus awk
will be 'allowed' to skip the FNR--NR { ... }
block and begin processing the remainder of the code which is applied to the second file in the arguments list, 'main'. For example, !($NF in a)
, will not print the line if $NF
is not in the array.
Explanation2:
With regards to which column, you may find this helpful:
$1 # the first column
$2 # the second column
$3 # the third column
$NF # the last column
$(NF-1) # the second last column
$(NF-2) # the third last column