Unix command to find lines common in two files

前端 未结 11 1221
忘掉有多难
忘掉有多难 2020-11-27 10:32

I\'m sure I once found a unix command which could print the common lines from two or more files, does anyone know its name? It was much simpler than diff.

相关标签:
11条回答
  • 2020-11-27 10:45

    To complement the Perl one-liner, here's its awk equivalent:

    awk 'NR==FNR{arr[$0];next} $0 in arr' file1 file2
    

    This will read all lines from file1 into the array arr[], and then check for each line in file2 if it already exists within the array (i.e. file1). The lines that are found will be printed in the order in which they appear in file2. Note that the comparison in arr uses the entire line from file2 as index to the array, so it will only report exact matches on entire lines.

    0 讨论(0)
  • 2020-11-27 10:46

    To easily apply the comm command to unsorted files, use Bash's process substitution:

    $ bash --version
    GNU bash, version 3.2.51(1)-release
    Copyright (C) 2007 Free Software Foundation, Inc.
    $ cat > abc
    123
    567
    132
    $ cat > def
    132
    777
    321
    

    So the files abc and def have one line in common, the one with "132". Using comm on unsorted files:

    $ comm abc def
    123
        132
    567
    132
        777
        321
    $ comm -12 abc def # No output! The common line is not found
    $
    

    The last line produced no output, the common line was not discovered.

    Now use comm on sorted files, sorting the files with process substitution:

    $ comm <( sort abc ) <( sort def )
    123
                132
        321
    567
        777
    $ comm -12 <( sort abc ) <( sort def )
    132
    

    Now we got the 132 line!

    0 讨论(0)
  • 2020-11-27 10:52

    While

    grep -v -f 1.txt 2.txt > 3.txt
    

    gives you the differences of two files (what is in 2.txt and not in 1.txt), you could easily do a

    grep -f 1.txt 2.txt > 3.txt
    

    to collect all common lines, which should provide an easy solution to your problem. If you have sorted files, you should take comm nonetheless. Regards!

    0 讨论(0)
  • 2020-11-27 10:52

    On limited version of Linux (like a QNAP (nas) I was working on):

    • comm did not exist
    • grep -f file1 file2 can cause some problems as said by @ChristopherSchultz and using grep -F -f file1 file2 was really slow (more than 5 minutes - not finished it - over 2-3 seconds with the method below on files over 20MB)

    So here is what I did :

    sort file1 > file1.sorted
    sort file2 > file2.sorted
    
    diff file1.sorted file2.sorted | grep "<" | sed 's/^< *//' > files.diff
    diff file1.sorted files.diff | grep "<" | sed 's/^< *//' > files.same.sorted
    

    If files.same.sorted shall have been in same order than the original ones, than add this line for same order than file1 :

    awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file1 > files.same
    

    or, for same order than file2 :

    awk 'FNR==NR {a[$0]=$0; next}; $0 in a {print a[$0]}' files.same.sorted file2 > files.same
    
    0 讨论(0)
  • 2020-11-27 11:00
    perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1 file2
    
    0 讨论(0)
提交回复
热议问题