Need to remove the count from the output when using “uniq -c” command

后端未结

关注

 5  668

I am trying to read a file and sort it by number of occurrences of a particular field. Suppose i want to find out the most repeated date from a log file then i use uniq -c o

相关标签:

5条回答

春和景丽

2020-12-06 13:48
Instead of cut -d' ' -f2, try
```
awk '{$1="";print}'
```
Maybe you need to remove one more blank in the beginning:
```
awk '{$1="";print}' | sed 's/^.//'
```
or completly with sed, preserving original whitspace:
```
sed -r 's/^[^0-9]*[0-9]+//'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-06 14:01
an alternative solution is this:
```
uniq -c | sort -nr | awk '{print $1, $2}'
```
also you may easily print a single field.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野性不改

2020-12-06 14:02
Add tr -s to the pipe chain to "squeeze" multiple spaces into one space delimiter:
```
uniq -c | tr -s ' ' | cut -d ' ' -f3
```
tr is very useful in some obscure places. Unfortunately it doesn't get rid of the first leading space, hence the -f3
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-12-06 14:07
The count from uniq is preceded by spaces unless there are more than 7 digits in the count, so you need to do something like:
```
uniq -c | sort -nr | cut -c 9-
```
to get columns (character positions) 9 upwards. Or you can use sed:
```
uniq -c | sort -nr | sed 's/^.\{8\}//'
```
or:
```
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
```
This second option is robust in the face of a repeat count of 10,000,000 or more; if you think that might be a problem, it is probably better than the cut alternative. And there are undoubtedly other options available too.

Caveat: the counts were determined by experimentation on Mac OS X 10.7.3 but using GNU uniq from coreutils 8.3. The BSD uniq -c produced 3 leading spaces before a single digit count. The POSIX spec says the output from uniq -c shall be formatted as if with:
```
printf("%d %s", repeat_count, line);
```
which would not have any leading blanks. Given this possible variance in output formats, the sed script with the [0-9] regex is the most reliable way of dealing with the variability in observed and theoretical output from uniq -c:
```
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-12-06 14:09
If you want to work with the count field downstream, following command will reformat it to a 'pipe friendly' tab delimited format without the left padding:
```
 .. | sort | uniq -c | sed -r 's/^ +([0-9]+) /\1\t/'
```
For the original task it is a bit of an overkill, but after reformatting, cut can be used to remove the field, as OP intended:
```
 .. | sort | uniq -c | sed -r 's/^ +([0-9]+) /\1\t/' | cut -d $'\t' -f2-
```
0 讨论(0)
发布评论:

提交评论
- 加载中...