Unix command to find lines common in two files

前端未结

关注

 11  1221

I\'m sure I once found a unix command which could print the common lines from two or more files, does anyone know its name? It was much simpler than diff.

相关标签:

11条回答

伪装坚强ぢ

2020-11-27 10:45
To complement the Perl one-liner, here's its awk equivalent:
```
awk 'NR==FNR{arr[$0];next} $0 in arr' file1 file2
```
This will read all lines from file1 into the array arr[], and then check for each line in file2 if it already exists within the array (i.e. file1). The lines that are found will be printed in the order in which they appear in file2. Note that the comparison in arr uses the entire line from file2 as index to the array, so it will only report exact matches on entire lines.
0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-11-27 10:46
To easily apply the comm command to unsorted files, use Bash's process substitution:
```
$ bash --version
GNU bash, version 3.2.51(1)-release
Copyright (C) 2007 Free Software Foundation, Inc.
$ cat > abc
123
567
132
$ cat > def
132
777
321
```
So the files abc and def have one line in common, the one with "132". Using comm on unsorted files:
```
$ comm abc def
123
    132
567
132
    777
    321
$ comm -12 abc def # No output! The common line is not found
$
```
The last line produced no output, the common line was not discovered.

Now use comm on sorted files, sorting the files with process substitution:
```
$ comm <( sort abc ) <( sort def )
123
            132
    321
567
    777
$ comm -12 <( sort abc ) <( sort def )
132
```
Now we got the 132 line!
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2020-11-27 10:52
While
```
grep -v -f 1.txt 2.txt > 3.txt
```
gives you the differences of two files (what is in 2.txt and not in 1.txt), you could easily do a
```
grep -f 1.txt 2.txt > 3.txt
```
to collect all common lines, which should provide an easy solution to your problem. If you have sorted files, you should take comm nonetheless. Regards!
0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-11-27 10:52
On limited version of Linux (like a QNAP (nas) I was working on):
- comm did not exist
- grep -f file1 file2 can cause some problems as said by @ChristopherSchultz and using grep -F -f file1 file2 was really slow (more than 5 minutes - not finished it - over 2-3 seconds with the method below on files over 20MB)
So here is what I did :
```
sort file1 > file1.sorted
sort file2 > file2.sorted

diff file1.sorted file2.sorted | grep "<" | sed 's/^< *//' > files.diff
diff file1.sorted files.diff | grep "<" | sed 's/^< *//' > files.same.sorted
```
If files.same.sorted shall have been in same order than the original ones, than add this line for same order than file1 :
```
awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file1 > files.same
```
or, for same order than file2 :
```
awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file2 > files.same
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-11-27 11:00
```
perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1 file2
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2