Find content of one file from another file in UNIX

前端 未结 8 560
攒了一身酷
攒了一身酷 2020-12-01 07:39

I have 2 files. First file contains the list of row ID\'s of tuples of a table in the database. And second file contains SQL queries with these row ID\'s in \"where\" clause

相关标签:
8条回答
  • 2020-12-01 08:10

    ## reports any lines contained in < file 1> missing in < file 2>

    IFS=$(echo -en "\n\b") && for a in $(cat < file 1>); 
    do ((\!$(grep -F -c -- "$a" < file 2>))) && echo $a; 
    done && unset IFS
    

    or to do what the asker wants, take off the negation and redirect

    (IFS=$(echo -en "\n\b") && for a in $(cat < file 1>); 
    do (($(grep -F -c -- "$a" < file 2>))) && echo $a; 
    done && unset IFS) >> < file 3> 
    
    0 讨论(0)
  • 2020-12-01 08:18

    You don't need regexps, so grep -F -f file1 file2

    0 讨论(0)
  • 2020-12-01 08:19

    Most of previous answers are correct but the only thing that worked for me was this command

    grep -oi -f a.txt b.txt
    

    0 讨论(0)
  • 2020-12-01 08:20

    One way with awk:

    awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2
    

    This should be pretty quick. On my machine, it took under 2 seconds to create a lookup of 1 million entries and compare it against 3 million lines.

    Machine Specs:

    Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (8 cores)
    98 GB RAM
    
    0 讨论(0)
  • 2020-12-01 08:20

    I may be missing something, but wouldn't it be sufficient to just iterate the IDs in file1 and for each ID, grep file2 and store the matches in a third file? I.e.

     for ID in `cat file1`; do grep $ID file2; done > file3
    

    This is not terribly efficient (since file2 will be read over and over again), but it may be good enough for you. If you want more speed, I'd suggest to use a more powerful scripting language which lets you read file2 into a map which quickly allows identifying lines for a given ID.

    Here's a Python version of this idea:

    queryByID = {}
    
    for line in file('file2'):
      lastEquals = line.rfind('=')
      semicolon = line.find(';', lastEquals)
      id = line[lastEquals + 1:semicolon]
      queryByID[id] = line.rstrip()
    
    for line in file('file1'):
      id = line.rstrip()
      if id in queryByID:
        print queryByID[id]
    
    0 讨论(0)
  • 2020-12-01 08:23

    I suggest using a programming language such as Perl, Ruby or Python.

    In Ruby, a solution reading both files (f1 and f2) just once could be:

    idxes = File.readlines('f1').map(&:chomp)
    
    File.foreach('f2') do | line |
      next unless line =~ /where ri=(\d+);$/
      puts line if idxes.include? $1
    end
    

    or with Perl

    open $file, '<', 'f1';
    while (<$file>) { chomp; $idxs{$_} = 1; }
    close($file);
    
    open $file, '<', 'f2';
    while (<$file>) {
        next unless $_ =~ /where ri=(\d+);$/;
        print $_ if $idxs{$1};
    }
    close $file;
    
    0 讨论(0)
提交回复
热议问题