diff two big files in Python

后端 未结 1 601
忘了有多久
忘了有多久 2020-12-03 06:17

I have two big text files, near 2GB each. I need something like diff f1.txt f2.txt . Is there any way to do this task fast in python? Standard difflib

相关标签:
1条回答
  • 2020-12-03 06:57

    How about using difflib in way that you script can handle big files? Don't load the files in memory, but iterate through the files of the files and diff in chunks. For e.g 100 lines at a time.

    import difflib
    
    d = difflib.Differ()
    
    f1 = open('bigfile1')
    f2 = open('bigfile2')
    
    b1 = []
    b2 = []
    
    for n, lines in enumerate(zip(f1,f2)):
        if not (n % 100 == 0):
            b1.append(lines[0])
            b2.append(lines[1])
        else:
            diff = d.compare("".join(b1), "".join(b2))
            b1 = []
            b2 = []
            print ''.join(list(diff))
    
    diff = d.compare("".join(b1), "".join(b2))
    print ''.join(list(diff))
    f1.close()
    f2.close()
    
    0 讨论(0)
提交回复
热议问题