Using awk to count the number of occurrences of a word in a column

后端 未结 6 1769
难免孤独
难免孤独 2020-12-03 10:28
03/03/2014 12:31:21 BLOCK 10.1.34.1 11:22:33:44:55:66

03/03/2014 12:31:22 ALLOW 10.1.34.2 AA:BB:CC:DD:EE:FF

03/03/2014 12:31:25 BLOCK 10.1.34.1 55:66:77:88:99:AA
<         


        
相关标签:
6条回答
  • 2020-12-03 11:10

    Here is a non-code solution. You can string together the steps with pipes ( "|" ).

    awk '{print $3}' file | sort | uniq -c
    
    • awk '{print $3}'

      print the 3rd column , the default record separator in awk is white space.

    • sort

      sort the results

    • uniq -c

      count the number repeated occurrences

    0 讨论(0)
  • 2020-12-03 11:16

    The error in your awk invocation is that, in your "END" block, you have print $count. That takes the content of the count variable, assumes it is an integer, and attempts to find the corresponding field in the last line of input. What you really want is just print count, as that just prints the value in the count variable. It's sometimes easy to mix up different variable referencing schemes between bash, awk, python, etc., so it's an easy mistake to make.

    0 讨论(0)
  • 2020-12-03 11:19

    The reason is that you just need to print count rather than $count. Inside awk, you do not need to use $ to find variable. In your case, the awk will try to print $2 before ending which does not exit. Below code should work:

    awk ' BEGIN {count=0;} { if ($3 == "BLOCK") count+=1} END {print count}' firewall.log

    0 讨论(0)
  • 2020-12-03 11:23

    The reason that your code may not be working is END is case sensitive so your script will be checking the variable end exists(which it doesn't) and so the last block will never be executed. If you change that then it should work.

    Also you do not need the BEGIN block as all variable are instantiated at 0.

    Below I have added an alternative way of doing this that you may want to use instead.

    This is similar to glenn's but captures only the words you want, it should use little memory because of this.


    Using Gawk(for the third arg of match)

    awk 'match($3,/BLOCK|ALLOW/,b){a[b[0]]++}END{for(i in a)print i ,a[i]}' file
    

    This block only executes if BLOCK or ALLOW are contained in the third field.
    The match captures what has been matched into the array b.
    Then array a is incremented for the matched field.

    In the END block each captured field is outputted with a count of occurences.


    The output is

    ALLOW 1
    BLOCK 2
    
    0 讨论(0)
  • 2020-12-03 11:24

    I tested your statement

    awk ' BEGIN {count=0;}  { if ($3 == "BLOCK") count+=1} end {print $count}' firewall.log
    

    and was able to successfully count BLOCK by doing two changes

    1. end should be in caps
    2. remove $ from print $count

    So, it should be:

    awk ' BEGIN {count=0;}  { if ($3 == "BLOCK") count+=1} END {print count}' firewall.log 
    

    A simpler statement that works too is:

    awk '($3 == "BLOCK") {count++ } END { print count }' firewall.log
    
    0 讨论(0)
  • 2020-12-03 11:27

    Use an array

    awk '{count[$3]++} END {for (word in count) print word, count[word]}' file
    

    If you want "block" specifically: END {print count["BLOCK"]}

    0 讨论(0)
提交回复
热议问题