I\'d like to join two files in bash using a common column. I want to retain both all pairable and unpairable lines from both files. Unfortunately using
According to join
's man page, -a
retains all unpairable lines from file
(1 or 2). So, just add -a1 -a2
to your command line and you should be done. For example:
# cat a
1 blah
2 foo
# cat b
2 bar
3 baz
# join -1 1 -2 1 -t" " a b
2 foo bar
# join -1 1 -2 1 -t" " -a1 a b
1 blah
2 foo bar
# join -1 1 -2 1 -t" " -a2 a b
2 foo bar
3 baz
# join -1 1 -2 1 -t" " -a1 -a2 a b
1 blah
2 foo bar
3 baz
Is this what you were looking for?
Edit:
Since you provided more detail, here is how to produce your desired output (note that my file a
is your first file and my file b
your second file. I had to reverse -1 1 -2 2 to -1 2 -2 1 to join on the id). I added a field list to format the output as well - note that '0' is the join field in it:
# join -1 2 -2 1 -o 1.1,0,1.3,1.4,2.2,2.3 a b
produces what you've given. Add -a1 -a2 to retain unpairable lines from both files you then get two more lines (you can guess my test data from them):
x id4 u t
id5 ui oi
Which is rather unreadable since any left out field is just a space. So let's replace them with a '-', leading to:
# join -1 2 -2 1 -a1 -a2 -e- -o 1.1,0,1.3,1.4,2.2,2.3 a b
x id1 a b df cf
x id1 a b ds dg
x id1 c d df cf
x id1 c d ds dg
x id1 d f df cf
x id1 d f ds dg
x id2 c x cv df
x id2 c x as ds
x id3 f v cf cg
x id4 u t - -
- id5 - - ui oi