Comparing two text files in python

前端 未结 4 1241
执念已碎
执念已碎 2021-02-05 12:20

I need to compare two files and redirect the different lines to third file. I know using diff command i can get the difference . But, is there any way of doing it in python ? An

4条回答
  •  情深已故
    2021-02-05 13:04

    Comparing two text files in python?

    Sure, difflib makes it easy.

    Let's set up a demo:

    f1path = 'file1'
    f2path = 'file2'
    text1 = '\n'.join(['a', 'b', 'c', 'd', ''])
    text2 = '\n'.join(['a', 'ba', 'bb', 'c', 'def', ''])
    for path, text in ((f1path, text1), (f2path, text2)):
        with open(path, 'w') as f:
            f.write(text)
    

    Now to inspect a diff. The lines that use os and time are merely used to provide a decent timestamp for the last time your files were modified, and are completely optional, and are optional arguments to difflib.unified_diff:

    # optional imports:
    import os
    import time
    # necessary import:
    import difflib
    

    Now we just open the files, and pass a list of their lines (from f.readlines) to difflib.unified_diff, and join the list output with an empty string, printing the results:

    with open(f1path, 'rU') as f1:
        with open(f2path, 'rU') as f2:
            readable_last_modified_time1 = time.ctime(os.path.getmtime(f1path)) # not required
            readable_last_modified_time2 = time.ctime(os.path.getmtime(f2path)) # not required
            print(''.join(difflib.unified_diff(
              f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path, 
              fromfiledate=readable_last_modified_time1, # not required
              tofiledate=readable_last_modified_time2, # not required
              )))
    

    which prints:

    --- file1       Mon Jul 27 08:38:02 2015
    +++ file2       Mon Jul 27 08:38:02 2015
    @@ -1,4 +1,5 @@
     a
    -b
    +ba
    +bb
     c
    -d
    +def
    

    Again, you can remove all the lines that are declared optional/not required and get the otherwise same results without the timestamp.

    redirect the different lines to a third file

    instead of printing, open a third file to write the lines:

            difftext = ''.join(difflib.unified_diff(
              f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path, 
              fromfiledate=readable_last_modified_time1, # not required
              tofiledate=readable_last_modified_time2, # not required
              ))
            with open('diffon1and2', 'w') as diff_file:
                diff_file.write(difftext)
    

    and:

    $ cat diffon1and2
    --- file1       Mon Jul 27 11:38:02 2015
    +++ file2       Mon Jul 27 11:38:02 2015
    @@ -1,4 +1,5 @@
     a
    -b
    +ba
    +bb
     c
    -d
    +def
    

提交回复
热议问题