问题
I'm trying to solve a problem in awk as an exercise but I'm having trouble.
I want awk (or gawk) to be able to print all unique destination ports for a particular source IP address.
The source IP address is field 1 ($1) and the destination port is field 4 ($4).
Cut for brevity:
SourceIP SrcPort DstIP DstPort
192.168.1.195 59508 98.129.121.199 80
192.168.1.87 64802 192.168.1.2 53
10.1.1.1 41170 199.253.249.63 53
10.1.1.1 62281 204.14.233.9 443
I imagine you would store each Source IP as in index to an array. But I'm not quite sure how you would store destination ports as values. Maybe you can keep appending to a string, being the value of the index e.g. "80,"..."80,443,"... for each match. But maybe that's not the best solution.
I'm not too concerned about output, I really just want to see how one can approach this in awk. Though, for output I was thinking something like,
Source IP:dstport, dstport, dstport
192.168.1.195:80,443,8088,5900
I'm tinkering with something like this,
awk '{ if ( NR == 1) next; arr[$1,$4] = $4 } END { for (i in arr) print arr[i] }' infile
but cannot figure out how to print out the elements and their values for a two-dimensional array. It seems something along this line would take care of the unique destination port task because each port is overwriting the value of the element.
Note: awk/gawk solution will get the answer!
Solution EDIT: slightly modified Kent's solution to print unique destination ports as mentioned in my question and to skip the column header line.
awk '{ if ( NR == 1 ) next ; if ( a[$1] && a[$1] !~ $4 ) a[$1] = a[$1]","$4; else a[$1] = $4 } END {for(x in a)print x":"a[x]}'
回答1:
here is one way with awk:
awk '{k=$1;a[k]=a[k]?a[k]","$4:$4}END{for(x in a)print x":"a[x]}' file
with your example, the output is:
kent$ awk '{k=$1;a[k]=a[k]?a[k]","$4:$4}END{for(x in a)print x":"a[x]}' file
192.168.1.195:80
192.168.1.87:53
10.1.1.1:53,443
(I omitted the title line)
EDIT
k=$1;a[k]=a[k]?a[k]","$4:$4
is exactly same as:
if (a[$1]) # if a[$1] is not empty
a[$1] = a[$1]","$4 # concatenate $4 to it separated by ","
else # else if a[$1] is empty
a[$1] = $4 # let a[$1]=$4
I used k=$1
just for saving some typing. also the x=boolean?a:b
expression
I hope the explanation could let you understand the codes.
回答2:
I prefer a solution using perl
because I like more the posibilities of creating data structures like hash of arrays:
perl -ane '
## Same BEGIN block than AWK. It prints header before processing any input.
BEGIN { printf qq|%s:%s\n|, q|Source IP|, q|dstport| }
## Skip first input line (header).
next if $. == 1;
## This is what you were thinking to achieve. Store source IP as key of a
## hash, and instead of save a string, it will save an array with all
## ports.
push @{ $ip{ $F[0] } }, $F[ 3 ];
## Same END block than AWK. For each IP, get all ports saved in the array
## and join them using a comma.
END { printf qq|%s:%s\n|, $_, join q|,|, @{ $ip{ $_ } } for keys %ip }
' infile
It yields:
Source IP:dstport
192.168.1.195:80
10.1.1.1:53,443
192.168.1.87:53
来源:https://stackoverflow.com/questions/16742955/awk-create-list-of-destination-ports-seen-for-each-source-ip-from-a-bro-log-co