union of two columns of a tsv file

心不动则不痛 提交于 2019-12-11 07:45:46

问题


I've a file which stores a directed graph. Each line is represented as

node1 TAB node2 TAB weight

I want to find the set of nodes. Is there a better way of getting union? My current solution involves creating temporary files:

cut -f1 input_graph | sort | uniq > nodes1
cut -f2 input_graph | sort | uniq > nodes2
cat nodes1 nodes2 | sort | uniq > nodes

回答1:


{ cut -f1 input_graph; cut -f2 input_graph; } | sort | uniq

No need to sort twice.

The { cmd1; cmd2; } syntax is equivalent to (cmd1; cmd2) but may avoid a subshell.

In another language (e.g. Perl), you could slurp the first column in a hash and then process the second column sequentially.

With Bash only, you can avoid temporary files by using the syntax cat <(cmd1) <(cmd2). Bash takes care of creating temporary file descriptors and setting up pipelines.

In a script (where you may want to avoid requiring bash), if you end up needing temporary files, use mktemp



来源:https://stackoverflow.com/questions/19020255/union-of-two-columns-of-a-tsv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!