how to merge rows that share unique IDs into a comma separated table

后端未结

关注

 3  1612

I would like to ask for some hints in how to merge rows that share unique IDs into a comma separated table. Any hints in Perl, sed or awk are greatly appreciated.

This i

相关标签:

3条回答

我在风中等你

2021-01-29 09:34

Using a Perl hash of arrays...

#!/usr/bin/perl
use warnings;
use strict;

my %data;
my $header;

while(<DATA>){
    chomp;

    if ($. == 1){
        $header = $_;
        next;
    }
    push @{ $data{(split)[0]} }, (split)[1];
}

print "$header\n";

for my $k (sort {$a<=>$b} keys %data){

    print "$k\t";
    print join(', ', @{ $data{$k} });
    print "\n";
}

__DATA__
protein_id go_id
4102    GO:0003676
4125    GO:0003676
4125    GO:0008270
4139    GO:0008270

0 讨论(0)

鱼传尺愫

2021-01-29 09:47

$ cat data.txt 
protein_id go_id
4102    GO:0003676
4125    GO:0003676
4125    GO:0008270
4139    GO:0008270
$ perl -aE'sub a{say"$a\t",join", ",@a if$a;@a=($F[1]);$a=$F[0]}$F[0]eq$a?push@a,$F[1]:a()}{a()' data.txt
protein_id      go_id
4102    GO:0003676
4125    GO:0003676, GO:0008270
4139    GO:0008270

0 讨论(0)

旧巷少年郎

2021-01-29 09:54

Using awk

Input

$ cat file
protein_id go_id
4102    GO:0003676
4125    GO:0003676
4125    GO:0008270
4139    GO:0008270

Output (if order doesn't matter)

$ awk 'FNR==1{print;next}{A[$1]=$1 in A ? A[$1]", "$2:$2}END{for(i in A)print i,A[i]}' file
protein_id go_id
4139 GO:0008270
4102 GO:0003676
4125 GO:0003676, GO:0008270

Better Readable version

awk '
      FNR==1{
              print
              next
            }
            {
              A[$1]=$1 in A ? A[$1]", "$2:$2
            }
         END{
              for(i in A)
                   print i,A[i]
            }
    ' file

Output (if order is important)

$ awk 'FNR==1{print;next}$1 in A{A[$1]=A[$1]", "$2;next}{A[O[++c]=$1]=$2}END{for(i=1; i in O; i++)print O[i],A[O[i]]}' file
protein_id go_id
4102 GO:0003676
4125 GO:0003676, GO:0008270
4139 GO:0008270

Better Readable version

awk '
     FNR==1{
             print
             next
           }
    $1 in A{
             A[$1]=A[$1]", "$2
             next
           }
           {
            A[O[++c]=$1]=$2
           }
        END{
             for(i=1; i in O; i++)
                  print O[i],A[O[i]]
           }
    ' file

0 讨论(0)