Best way to simulate “group by” from bash?

前端未结

关注

 14  1014

Suppose you have a file that contains IP addresses, one address in each line:

10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1

You need a

相关标签:

14条回答

礼貌的吻别

2020-11-29 15:56
I understand you are looking for something in Bash, but in case someone else might be looking for something in Python, you might want to consider this:
```
mySet = set()
for line in open("ip_address_file.txt"):
     line = line.rstrip()
     mySet.add(line)
```
As values in the set are unique by default and Python is pretty good at this stuff, you might win something here. I haven't tested the code, so it might be bugged, but this might get you there. And if you want to count occurrences, using a dict instead of a set is easy to implement.

Edit: I'm a lousy reader, so I answered wrong. Here's a snippet with a dict that would count occurences.
```
mydict = {}
for line in open("ip_address_file.txt"):
    line = line.rstrip()
    if line in mydict:
        mydict[line] += 1
    else:
        mydict[line] = 1
```
The dictionary mydict now holds a list of unique IP's as keys and the amount of times they occurred as their values.
0 讨论(0)
发布评论:

提交评论
- 加载中...

难免孤独

2020-11-29 15:57

for summing up multiple fields, based on a group of existing fields, use the example below : ( replace the $1, $2, $3, $4 according to your requirements )

cat file

US|A|1000|2000
US|B|1000|2000
US|C|1000|2000
UK|1|1000|2000
UK|1|1000|2000
UK|1|1000|2000

awk 'BEGIN { FS=OFS=SUBSEP="|"}{arr[$1,$2]+=$3+$4 }END {for (i in arr) print i,arr[i]}' file

US|A|3000
US|B|3000
US|C|3000
UK|1|9000

0 讨论(0)

陌清茗

2020-11-29 16:00
Sort may be omitted if order is not significant
```
uniq -c <source_file>
```
or
```
echo "$list" | uniq -c
```
if the source list is a variable
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2020-11-29 16:02
```
cat ip_addresses | sort | uniq -c | sort -nr | awk '{print $2 " " $1}'
```
this command would give you desired output
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-11-29 16:02
I feel awk associative array is also handy in this case
```
$ awk '{count[$1]++}END{for(j in count) print j,count[j]}' ips.txt
```
A group by post here
0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2020-11-29 16:03

Solution ( group by like mysql)

grep -ioh "facebook\|xing\|linkedin\|googleplus" access-log.txt | sort | uniq -c | sort -n

Result

3249  googleplus
4211 linkedin
5212 xing
7928 facebook

0 讨论(0)