TSV: how to concatenate field 2s if field 1 is duplicate

折月煮酒 提交于 2019-12-24 14:52:48

问题


I'm building a Swedish-English sentence deck for ANKI from the Creative Common licensed content of tatoeba.org.

Please help me turning sample 1 to sample 2 (preferably in bash):

#sample1
a 1
a 2
b 3
c 4
c 5

#sample2
a 1<br>2
b 3
c 4<br>5

Duplicates in field 1 are always subsequent.

Thank you!


回答1:


One way using awk:

awk 'p==$1{printf "<br>%s", $2;next}{if(p){print ""};p=$1;printf "%s", $0}END{print ""}' file
a 1<br>2
b 3
c 4<br>5



回答2:


perl -ape '$_ = ($l eq $F[0]) ? "<br>$F[1]" : "\n@F"; $l=$F[0]' file



回答3:


Try this awk command also,

awk 'BEGIN {getline; id=$1; line=$0} {if ($1 != id) {print line; line = $0; } else {line = line "<br>" $2;} id=$1;} END {print line;}' file

Otput:

a 1<br>2
b 3
c 4<br>5



回答4:


This might work for you (GNU sed):

sed -r 'N;s/^((\S+\s).*)\n\2/\1<br>/;P;D' file

Compare the current line with the subsequent line and if the keys match combine otherwise print the current line, delete it and repeat.




回答5:


awk '{if(a[$1]){a[$1]=a[$1]"<br>"$2}else{a[$1]=$1FS$2;b[i++]=$1}} END{for(i=0;i in b; i++) print a[b[i]];}' sample1

Output:

a 1<br>2
b 3
c 4<br>5

Creates the output in array a, uses array b to preserve the order of lines.




回答6:


Here is another awk

awk 'f!=$1 {printf (a?RS:"")$0;f=$1;a=1;next} {print "<br>"$2;f=$1;a=0}' file
a 1<br>2
b 3
c 4<br>5


来源:https://stackoverflow.com/questions/23719065/tsv-how-to-concatenate-field-2s-if-field-1-is-duplicate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!