Deleting lines from one file which are in another file

后端 未结 9 586
余生分开走
余生分开走 2020-11-28 01:46

I have a file f1:

line1
line2
line3
line4
..
..

I want to delete all the lines which are in another file f2:

相关标签:
9条回答
  • 2020-11-28 02:19

    Did you try this with sed?

    sed 's#^#sed -i '"'"'s%#g' f2 > f2.sh
    
    sed -i 's#$#%%g'"'"' f1#g' f2.sh
    
    sed -i '1i#!/bin/bash' f2.sh
    
    sh f2.sh
    
    0 讨论(0)
  • 2020-11-28 02:23

    Similar to Dennis Williamson's answer (mostly syntactic changes, e.g. setting the file number explicitly instead of the NR == FNR trick):

    awk '{if (f==1) { r[$0] } else if (! ($0 in r)) { print $0 } } ' f=1 exclude-these.txt f=2 from-this.txt

    Accessing r[$0] creates the entry for that line, no need to set a value.

    Assuming awk uses a hash table with constant lookup and (on average) constant update time, the time complexity of this will be O(n + m), where n and m are the lengths of the files. In my case, n was ~25 million and m ~14000. The awk solution was much faster than sort, and I also preferred keeping the original order.

    0 讨论(0)
  • 2020-11-28 02:25

    Not a 'programming' answer but here's a quick and dirty solution: just go to http://www.listdiff.com/compare-2-lists-difference-tool.

    Obviously won't work for huge files but it did the trick for me. A few notes:

    • I'm not affiliated with the website in any way (if you still don't believe me, then you can just search for a different tool online; I used the search term "set difference list online")
    • The linked website seems to make network calls on every list comparison, so don't feed it any sensitive data
    0 讨论(0)
提交回复
热议问题