Suppose you have a file that contains IP addresses, one address in each line:
10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1
You need a
Most of the other solutions count duplicates. If you really need to group key value pairs, try this:
Here is my example data:
find . | xargs md5sum
fe4ab8e15432161f452e345ff30c68b0 a.txt
30c68b02161e15435ff52e34f4fe4ab8 b.txt
30c68b02161e15435ff52e34f4fe4ab8 c.txt
fe4ab8e15432161f452e345ff30c68b0 d.txt
fe4ab8e15432161f452e345ff30c68b0 e.txt
This will print the key value pairs grouped by the md5 checksum.
cat table.txt | awk '{print $1}' | sort | uniq | xargs -i grep {} table.txt
30c68b02161e15435ff52e34f4fe4ab8 b.txt
30c68b02161e15435ff52e34f4fe4ab8 c.txt
fe4ab8e15432161f452e345ff30c68b0 a.txt
fe4ab8e15432161f452e345ff30c68b0 d.txt
fe4ab8e15432161f452e345ff30c68b0 e.txt
There is a way, using a bash function. This way is very quick as there is no fork!...
... While bunch of ip addresses stay small!
countIp () {
local -a _ips=(); local _a
while IFS=. read -a _a ;do
((_ips[_a<<24|${_a[1]}<<16|${_a[2]}<<8|${_a[3]}]++))
done
for _a in ${!_ips[@]} ;do
printf "%.16s %4d\n" \
$(($_a>>24)).$(($_a>>16&255)).$(($_a>>8&255)).$(($_a&255)) ${_ips[_a]}
done
}
Note: IP addresses are converted to 32bits unsigned integer value, used as index for array. This use simple bash arrays, not associative array (wich is more expensive)!
time countIp < ip_addresses
10.0.10.1 3
10.0.10.2 1
10.0.10.3 1
real 0m0.001s
user 0m0.004s
sys 0m0.000s
time sort ip_addresses | uniq -c
3 10.0.10.1
1 10.0.10.2
1 10.0.10.3
real 0m0.010s
user 0m0.000s
sys 0m0.000s
On my host, doing so is a lot quicker than using forks, upto approx 1'000 addresses, but take approx 1 entire second when I'll try to sort'n count 10'000 addresses.