Returning lines that differ between two files (Python)

前端未结

关注

 3  700

I have two files with tens of thousands of lines each, output1.txt and output2.txt. I want to iterate through both files and return the line (and content) of the lines that diff

相关标签:

3条回答

暖寄归人

2021-02-06 17:55

7.4. difflib — Helpers for computing deltas

New in version 2.1.

This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the filecmp module.

0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2021-02-06 17:58

You can do something like this:

import difflib, sys

tl=100000    # large number of lines

# create two test files (Unix directories...)

with open('/tmp/f1.txt','w') as f:
    for x in range(tl):
        f.write('line {}\n'.format(x))

with open('/tmp/f2.txt','w') as f:
    for x in range(tl+10):   # add 10 lines
        if x in (500,505,1000,tl-2):
            continue         # skip these lines
        f.write('line {}\n'.format(x))        

with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
    diff = difflib.ndiff(f1.readlines(),f2.readlines())    
    for line in diff:
        if line.startswith('-'):
            sys.stdout.write(line)
        elif line.startswith('+'):
            sys.stdout.write('\t\t'+line)

Prints (in 400 ms):

- line 500
- line 505
- line 1000
- line 99998
        + line 100000
        + line 100001
        + line 100002
        + line 100003
        + line 100004
        + line 100005
        + line 100006
        + line 100007
        + line 100008
        + line 100009

If you want the line number, use enumerate:

with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
    diff = difflib.ndiff(f1.readlines(),f2.readlines())    
    for i,line in enumerate(diff):
        if line.startswith(' '):
            continue
        sys.stdout.write('My count: {}, text: {}'.format(i,line))

0 讨论(0)

执笔经年

2021-02-06 18:21

As long as you don't care about order you could use:

with open('file1') as f:
    t1 = f.read().splitlines()
    t1s = set(t1)

with open('file2') as f:
    t2 = f.read().splitlines()
    t2s = set(t2)

#in file1 but not file2
print "Only in file1"
for diff in t1s-t2s:
    print t1.index(diff), diff

#in file2 but not file1
print "Only in file2"
for diff in t2s-t1s:
    print t2.index(diff), diff

Edit: If you do care about order and they're mostly the same then why not just use the command diff?

0 讨论(0)

Returning lines that differ between two files (Python)

7.4. difflib — Helpers for computing deltas