I have two files, both in the same format -- two columns both containing a number, for example:
file 1
1.00 99
2.00 343
3.00
You can do it with Alacon - command-line utility for Alasql database.
It works with Node.js, so you need to install Node.js and then Alasql package:
To join two data from tab-separated files you can use the following command:
> node alacon "SELECT * INTO TSV("main.txt") FROM TSV('data1.txt') data1
JOIN TSV('data2.txt') data2 USING [0]"
This is one very long line. In this example all files have data in "Sheet1" sheets.
join file1 file2
Which assumes that the files are sorted on the join field. If they are not, you can do this:
join <(sort -V file1) <(sort -V file2)
Here's an AWK version (the sort
compensates for AWK's non-deterministic array ordering):
awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V
It seems shorter and more readable than the Perl answer.
In gawk
4, you can set the array traversal order:
awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2
and you won't have to use the sort
utility. @ind_num_asc
is Index Numeric Ascending. See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk.
Note that -V
(--version-sort
) in the sort
commands above requires GNU sort
from coreutils 7.0 or later. Thanks for @simlev pointing out that it should be used if available.
A Perl-solution
perl -anE 'push @{$h{$F[0]}}, $F[1]; END{ say "$_\t$h{$_}->[0]\t$h{$_}->[1]" for sort{$a<=>$b} keys %h }' file_1 file_2 > file_3
Ok, looking at the awk-oneliner this is shorter then my first try and it has the nicer output then the awk-oneliner and it doesn't use the 'pipe sort -n':
perl -anE '$h{$F[0]}="$h{$F[0]}\t$F[1]"; END{say "$_$h{$_}" for sort {$a<=>$b} keys %h}' file_1 file_2
And the one-liners behave different then the join-example if there are entries with no value in the second column in the first file.