How can I compare 3 files together (to see what is in common between them)?

瘦欲@ 提交于 2019-12-06 09:08:18

If it's simply to print out the pairs (column1 + column2) that are common in all 3 files, and making use of the fact that a pair is unique within a file, you could do it this way:

awk '{print $1" "$2}' a b c | sort | uniq -c | awk '{if ($1==3){print $2" "$3}}'

This can be made with arbitrary numbers of files as long as you modify the param of the last command.

Here's what it does:

  1. prints and sorts the first 2 columns of all files (awk '{print $1" "$2}' a b c | sort)
  2. count the number of duplicate entries (uniq -c)
  3. if duplicate entry count == number of files, we found a match. print it.

If you're doing this often, you can express it as a bash function (and drop it in your .bashrc) which parametrises the file counts.

function common_pairs { 
    awk '{print $1" "$2}' $@ | sort | uniq -c | awk -v numf=$# '{if ($1==numf){print $2" "$3}}'; 
}

Call it with any number of files you want: common_pairs file1 file2 file3 fileN

For this I'd use the commands cut, sort and comm.

  1. With cut cut away the fields not needed.

  2. sort the outcome since comm expects sorted input.

  3. Use comm to get the lines which are in file1 and file2.

  4. Use comm again to get the lines that are also in file3.

A script could look like this:

 for i in 1 2 3
  do
   # options to cut may have to be adjusted for your input files
   cut -c1-15 file$i | sort > tmp.$i
  done

 comm -12 tmp.1 tmp.2   > tmp.1+2
 comm -12 tmp.3 tmp.1+2 > tmp.1+2+3

(Of course one may use extended shell syntax to avoid temporary files, but I don't want to hide the idea behind complex syntax expressions)

In file tmp.1+2+3 you now should have the keys present in all three files. If you're interested in the whole lines, you may use the command join in combination with a sorted version of any of the thee input files)

Just read your last comment - You want the files joined, but duplicates removed?

 sort file1 file2 file3 | uniq > newfile

Not intended to start an editor war, but I am familiar with VI, and vimdiff and its variants show the comparison between multiple files in parallel view, which I find very handy. Simply you can call it with

$ vimdiff <filelist>
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!