Deleting lines from one file which are in another file

后端未结

关注

 9  585

I have a file f1:

line1
line2
line3
line4
..
..

I want to delete all the lines which are in another file f2:

相关标签:

9条回答

情话喂你

2020-11-28 02:02
For exclude files that aren't too huge, you can use AWK's associative arrays.
```
awk 'NR == FNR { list[tolower($0)]=1; next } { if (! list[tolower($0)]) print }' exclude-these.txt from-this.txt 
```
The output will be in the same order as the "from-this.txt" file. The tolower() function makes it case-insensitive, if you need that.

The algorithmic complexity will probably be O(n) (exclude-these.txt size) + O(n) (from-this.txt size)
0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-11-28 02:03

Seems to be a job suitable for the SQLite shell:

create table file1(line text);
create index if1 on file1(line ASC);
create table file2(line text);
create index if2 on file2(line ASC);
-- comment: if you have | in your files then specify “ .separator ××any_improbable_string×× ”
.import 'file1.txt' file1
.import 'file2.txt' file2
.output result.txt
select * from file2 where line not in (select line from file1);
.q

0 讨论(0)

不思量自难忘°

2020-11-28 02:06

Some timing comparisons between various other answers:

$ for n in {1..10000}; do echo $RANDOM; done > f1
$ for n in {1..10000}; do echo $RANDOM; done > f2
$ time comm -23 <(sort f1) <(sort f2) > /dev/null

real    0m0.019s
user    0m0.023s
sys     0m0.012s
$ time ruby -e 'puts File.readlines("f1") - File.readlines("f2")' > /dev/null

real    0m0.026s
user    0m0.018s
sys     0m0.007s
$ time grep -xvf f2 f1 > /dev/null

real    0m43.197s
user    0m43.155s
sys     0m0.040s

sort f1 f2 | uniq -u isn't even a symmetrical difference, because it removes lines that appear multiple times in either file.

comm can also be used with stdin and here strings:

echo $'a\nb' | comm -23 <(sort) <(sort <<< $'c\nb') # a

0 讨论(0)

轻奢々

2020-11-28 02:13

if you have Ruby (1.9+)

#!/usr/bin/env ruby 
b=File.read("file2").split
open("file1").each do |x|
  x.chomp!
  puts x if !b.include?(x)
end

Which has O(N^2) complexity. If you want to care about performance, here's another version

b=File.read("file2").split
a=File.read("file1").split
(a-b).each {|x| puts x}

which uses a hash to effect the subtraction, so is complexity O(n) (size of a) + O(n) (size of b)

here's a little benchmark, courtesy of user576875, but with 100K lines, of the above:

$ for i in $(seq 1 100000); do echo "$i"; done|sort --random-sort > file1
$ for i in $(seq 1 2 100000); do echo "$i"; done|sort --random-sort > file2
$ time ruby test.rb > ruby.test

real    0m0.639s
user    0m0.554s
sys     0m0.021s

$time sort file1 file2|uniq -u  > sort.test

real    0m2.311s
user    0m1.959s
sys     0m0.040s

$ diff <(sort -n ruby.test) <(sort -n sort.test)
$

diff was used to show there are no differences between the 2 files generated.

0 讨论(0)

日久生厌

2020-11-28 02:15
grep -v -x -f f2 f1 should do the trick.

Explanation:
- -v to select non-matching lines
- -x to match whole lines only
- -f f2 to get patterns from f2
One can instead use grep -F or fgrep to match fixed strings from f2 rather than patterns (in case you want remove the lines in a "what you see if what you get" manner rather than treating the lines in f2 as regex patterns).
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-11-28 02:18
Try comm instead (assuming f1 and f2 are "already sorted")
```
comm -2 -3 f1 f2
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页