问题
I've a file which stores a directed graph. Each line is represented as
node1 TAB node2 TAB weight
I want to find the set of nodes. Is there a better way of getting union? My current solution involves creating temporary files:
cut -f1 input_graph | sort | uniq > nodes1
cut -f2 input_graph | sort | uniq > nodes2
cat nodes1 nodes2 | sort | uniq > nodes
回答1:
{ cut -f1 input_graph; cut -f2 input_graph; } | sort | uniq
No need to sort twice.
The { cmd1; cmd2; } syntax is equivalent to (cmd1; cmd2) but may avoid a subshell.
In another language (e.g. Perl), you could slurp the first column in a hash and then process the second column sequentially.
With Bash only, you can avoid temporary files by using the syntax cat <(cmd1) <(cmd2)
. Bash takes care of creating temporary file descriptors and setting up pipelines.
In a script (where you may want to avoid requiring bash), if you end up needing temporary files, use mktemp
来源:https://stackoverflow.com/questions/19020255/union-of-two-columns-of-a-tsv-file