问题
I know (see here) that you can use paste
to combine multiple files into a .csv
file if each file holds a column
i.e.. paste -d "," column1.dat column2.dat column3.dat ... > myDat.csv
will result in
myDat.csv
column1, column2, column3, ...
c1-1, c2-1, c3-1, ...
c1-2, c2-2, c3-2, ...
... ... ...
(without the tabs. just inserted them to make it more readable)
What if I have multiple measurements, instead?
e.g.
file1.dat
has format <xvalue> <y1value>
file2.dat
has format <xvalue> <y2avlue>
file3.dat
has format <xvalue> <uvalue> <vvalue>
and I ultimately want a csv like
<xvalue>, <y1value>, <y2value>, <empty column>, <uvalue>, <vvalue>
?
How do I combine the files now?
Edit
Note that although each file is sorted (or can be sorted if it's not), they don't necessarily contain the same xvalues on the same lines.
If a file doesn't have an xvalue that another file does have, its corresponding column entry should be blank.
(Actually, I think dropping the rows for xvalues that aren't present in all files should also work.)
回答1:
Ok, here is my solution in Gnu awk which tries to lean towards being a more generic solution and handles that extra empty column with external tools. It is in Gnu awk since it uses multidimensional arrays but could probably easily be generalized to other awks as well.
The program joins fields expecting the first field of each file to be the key column. If it does not find a key to join to, it creates a new key and outputs nonexistent fields as empty when outputing (notice keys x_3
, x_4
and x_5
below in data files).
First the data files:
$ cat file[123].dat # 3 files, separated by empty lines for clarity
x_1 y1_1
x_2 y1_2
x_3 y1_3
x_1 y2_1
x_2 y2_2
x_4 y2_4
x_1 u_1 v_1
x_2 u_2 v_2
x_5 u_5 v_5
And the code:
$ cat program.awk
BEGIN { OFS=", " }
FNR==1 { f++ } # counter of files
{
a[0][$1]=$1 # reset the key for every record
for(i=2;i<=NF;i++) # for each non-key element
a[f][$1]=a[f][$1] $i ( i==NF?"":OFS ) # combine them to array element
}
END { # in the end
for(i in a[0]) # go thru every key
for(j=0;j<=f;j++) # and all related array elements
printf "%s%s", a[j][i], (j==f?ORS:OFS)
} # output them, nonexistent will output empty
Usage and output:
$ awk -f program.awk \
file1.dat \
file2.dat \
<(grep -h . file[123].dat|cut -d\ -f 1|sort|uniq) \
file3.dat
x_1, y1_1, y2_1, , u_1, v_1
x_2, y1_2, y2_2, , u_2, v_2
x_3, y1_3, , ,
x_4, , y2_4, ,
x_5, , , , u_5, v_5
The empty column after file2.dat
will be generated with empty field created by gathering all the keys and inputing them as another "file" (using process substitution <()
) to keep the program more generic:
$ grep -h . file[123].dat|cut -d\ -f 1|sort|uniq
x_1
x_2
x_3
x_4
x_5
回答2:
Just use a process substitution?
paste -d, > myDat.csv \
file1.dat \
<(cut -d' ' -f2 file2.dat) \
/dev/null \
<(cut -d' ' -f2,3 file3.dat)
回答3:
You can use paste
to combine all the files, and then use awk
to only print the columns you want (including an empty column):
paste file1.dat file2.dat file3.dat | awk -v OFS=', ' '{print $1,$2,$4,"",$6,$7}'
Notice that columns $3
and $5
are excluded from the awk
command because they are the same as column $1
(i.e. they are all <xvalue>
).
来源:https://stackoverflow.com/questions/40373180/bash-combining-files-into-csvs