Returning lines that differ between two files (Python)

前端 未结 3 699
我寻月下人不归
我寻月下人不归 2021-02-06 17:38

I have two files with tens of thousands of lines each, output1.txt and output2.txt. I want to iterate through both files and return the line (and content) of the lines that diff

3条回答
  •  深忆病人
    2021-02-06 17:58

    You can do something like this:

    import difflib, sys
    
    tl=100000    # large number of lines
    
    # create two test files (Unix directories...)
    
    with open('/tmp/f1.txt','w') as f:
        for x in range(tl):
            f.write('line {}\n'.format(x))
    
    with open('/tmp/f2.txt','w') as f:
        for x in range(tl+10):   # add 10 lines
            if x in (500,505,1000,tl-2):
                continue         # skip these lines
            f.write('line {}\n'.format(x))        
    
    with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
        diff = difflib.ndiff(f1.readlines(),f2.readlines())    
        for line in diff:
            if line.startswith('-'):
                sys.stdout.write(line)
            elif line.startswith('+'):
                sys.stdout.write('\t\t'+line)   
    

    Prints (in 400 ms):

    - line 500
    - line 505
    - line 1000
    - line 99998
            + line 100000
            + line 100001
            + line 100002
            + line 100003
            + line 100004
            + line 100005
            + line 100006
            + line 100007
            + line 100008
            + line 100009
    

    If you want the line number, use enumerate:

    with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
        diff = difflib.ndiff(f1.readlines(),f2.readlines())    
        for i,line in enumerate(diff):
            if line.startswith(' '):
                continue
            sys.stdout.write('My count: {}, text: {}'.format(i,line))  
    

提交回复
热议问题