Best way to simulate “group by” from bash?

前端 未结 14 1014
半阙折子戏
半阙折子戏 2020-11-29 15:03

Suppose you have a file that contains IP addresses, one address in each line:

10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1

You need a

相关标签:
14条回答
  • 2020-11-29 15:56

    I understand you are looking for something in Bash, but in case someone else might be looking for something in Python, you might want to consider this:

    mySet = set()
    for line in open("ip_address_file.txt"):
         line = line.rstrip()
         mySet.add(line)
    

    As values in the set are unique by default and Python is pretty good at this stuff, you might win something here. I haven't tested the code, so it might be bugged, but this might get you there. And if you want to count occurrences, using a dict instead of a set is easy to implement.

    Edit: I'm a lousy reader, so I answered wrong. Here's a snippet with a dict that would count occurences.

    mydict = {}
    for line in open("ip_address_file.txt"):
        line = line.rstrip()
        if line in mydict:
            mydict[line] += 1
        else:
            mydict[line] = 1
    

    The dictionary mydict now holds a list of unique IP's as keys and the amount of times they occurred as their values.

    0 讨论(0)
  • 2020-11-29 15:57

    for summing up multiple fields, based on a group of existing fields, use the example below : ( replace the $1, $2, $3, $4 according to your requirements )

    cat file
    
    US|A|1000|2000
    US|B|1000|2000
    US|C|1000|2000
    UK|1|1000|2000
    UK|1|1000|2000
    UK|1|1000|2000
    
    awk 'BEGIN { FS=OFS=SUBSEP="|"}{arr[$1,$2]+=$3+$4 }END {for (i in arr) print i,arr[i]}' file
    
    US|A|3000
    US|B|3000
    US|C|3000
    UK|1|9000
    
    0 讨论(0)
  • 2020-11-29 16:00

    Sort may be omitted if order is not significant

    uniq -c <source_file>
    

    or

    echo "$list" | uniq -c
    

    if the source list is a variable

    0 讨论(0)
  • 2020-11-29 16:02
    cat ip_addresses | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'
    

    this command would give you desired output

    0 讨论(0)
  • 2020-11-29 16:02

    I feel awk associative array is also handy in this case

    $ awk '{count[$1]++}END{for(j in count) print j,count[j]}' ips.txt
    

    A group by post here

    0 讨论(0)
  • 2020-11-29 16:03

    Solution ( group by like mysql)

    grep -ioh "facebook\|xing\|linkedin\|googleplus" access-log.txt | sort | uniq -c | sort -n
    

    Result

    3249  googleplus
    4211 linkedin
    5212 xing
    7928 facebook
    
    0 讨论(0)
提交回复
热议问题