发表新帖

发表新帖

Deleting lines from one file which are in another file

后端未结

关注

 9  586

余生分开走

I have a file f1:

line1
line2
line3
line4
..
..

I want to delete all the lines which are in another file f2:

相关标签:

9条回答

2020-11-28 02:19

Did you try this with sed?

sed 's#^#sed -i '"'"'s%#g' f2 > f2.sh

sed -i 's#$#%%g'"'"' f1#g' f2.sh

sed -i '1i#!/bin/bash' f2.sh

sh f2.sh

0 讨论(0)

不思量自难忘°

2020-11-28 02:23

Similar to Dennis Williamson's answer (mostly syntactic changes, e.g. setting the file number explicitly instead of the NR == FNR trick):

awk '{if (f==1) { r[$0] } else if (! ($0 in r)) { print $0 } } ' f=1 exclude-these.txt f=2 from-this.txt

Accessing r[$0] creates the entry for that line, no need to set a value.

Assuming awk uses a hash table with constant lookup and (on average) constant update time, the time complexity of this will be O(n + m), where n and m are the lengths of the files. In my case, n was ~25 million and m ~14000. The awk solution was much faster than sort, and I also preferred keeping the original order.

0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2020-11-28 02:25
Not a 'programming' answer but here's a quick and dirty solution: just go to http://www.listdiff.com/compare-2-lists-difference-tool.

Obviously won't work for huge files but it did the trick for me. A few notes:
- I'm not affiliated with the website in any way (if you still don't believe me, then you can just search for a different tool online; I used the search term "set difference list online")
- The linked website seems to make network calls on every list comparison, so don't feed it any sensitive data
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题