Combine lines with matching first field

后端未结

关注

 4  2029

For a few years, I often have a need to combine lines of (sorted) text with a matching first field, and I never found an elegant (i.e. one-liner unix command line) way to do

相关标签:

4条回答

爱一瞬间的悲伤

2021-01-18 13:19
I think this one do the job
```
 awk -F':' '$1!=a{if(b);print b;b=""}a=$1{$1="";if(!b)b=a;b=b$0}END{print b}' infile
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2021-01-18 13:26
Using awk one liner
```
awk -F: -v ORS="" 'a!=$1{a=$1; $0=RS $0} a==$1{ sub($1":",";") } 1' file
```
Output:
```
apple:A fruit;Type of: pie
banana:tropical fruit
cherry:small burgundy fruit;1 for me to eat;bright red
```
setting ORS="" ; By default it is \n.
The reason why we have set ORS="" (Output Record Separator) is because we don't want awk to include newlines in the output at the end of each record. We want to handle it in our own way, through our own logic. We are actually including newlines at the start of every record which has the first field different from the previous one.

a!=$1 : When variable a (initially null) doesn't match with first field $1 which is for eg. applein first line, then set a=$1 and $0=RS $0 i.e $0 or simply whole record becomes "\n"$0 (basically adding newline at the beginning of record). a!=$1 will always satisfy when there is a different first field ($1) than the previous line's $1 and is thus a criteria to segregate our records based on first field.

a==$1: If it matches then it probably means you are iterating over a record belonging to the previous record set. In this case substitute first occurrence of $1: (Note the : ) for eg. apple: with ;. $1":" could also be written as $1FS where FS is :

If you have millions of line in your file then this approach would be fastest because it doesn't involve any pre-processing and also we are not using any other data structure say array for storing your keys or records.
0 讨论(0)
发布评论:

提交评论
- 加载中...

南旧

2021-01-18 13:30

Discover awk language:

awk -F':' '{ v=substr($0, index($0,":")+1); a[$1]=($1 in a? a[$1]";" : "")v }
           END{ for(i in a) print i,a[i] }' OFS=':' infile.txt

The output:

apple:A fruit;Type of: pie
banana:tropical fruit
cherry:small burgundy fruit;1 for me to eat;bright red

0 讨论(0)

旧巷少年郎

2021-01-18 13:30
```
for F in `cut -f1 -d ':' infile.txt | sort | uniq`; do echo "$F:$(grep $F infile.txt | cut -f2- -d ':' | paste -s -d ';' - )"; done
```
Not sure it qualifies as 'elegant', but it works, though I'm sure not quickly for millions of lines - as the number of grep calls increases it would slow significantly. What % of the matching fields do you expect to be unique?
0 讨论(0)
发布评论:

提交评论
- 加载中...