join in bash like in SAS

后端 未结 2 815
无人共我
无人共我 2021-01-14 18:10

I\'d like to join two files in bash using a common column. I want to retain both all pairable and unpairable lines from both files. Unfortunately using

2条回答
  •  遥遥无期
    2021-01-14 18:48

    According to join's man page, -a retains all unpairable lines from file (1 or 2). So, just add -a1 -a2 to your command line and you should be done. For example:

    # cat a
    1 blah
    2 foo
    
    # cat b
    2 bar
    3 baz
    
    # join -1 1 -2 1 -t" " a b
    2 foo bar
    
    # join -1 1 -2 1 -t" " -a1 a b
    1 blah
    2 foo bar
    
    # join -1 1 -2 1 -t" " -a2 a b
    2 foo bar
    3 baz
    
    # join -1 1 -2 1 -t" " -a1 -a2 a b
    1 blah
    2 foo bar
    3 baz
    

    Is this what you were looking for?

    Edit:

    Since you provided more detail, here is how to produce your desired output (note that my file a is your first file and my file b your second file. I had to reverse -1 1 -2 2 to -1 2 -2 1 to join on the id). I added a field list to format the output as well - note that '0' is the join field in it:

    # join -1 2 -2 1 -o 1.1,0,1.3,1.4,2.2,2.3 a b
    

    produces what you've given. Add -a1 -a2 to retain unpairable lines from both files you then get two more lines (you can guess my test data from them):

    x id4 u t
     id5   ui oi
    

    Which is rather unreadable since any left out field is just a space. So let's replace them with a '-', leading to:

    # join -1 2 -2 1 -a1 -a2 -e- -o 1.1,0,1.3,1.4,2.2,2.3 a b
    x id1 a b df cf
    x id1 a b ds dg
    x id1 c d df cf
    x id1 c d ds dg
    x id1 d f df cf
    x id1 d f ds dg
    x id2 c x cv df
    x id2 c x as ds
    x id3 f v cf cg
    x id4 u t - -
    - id5 - - ui oi
    

提交回复
热议问题