Compare directories on file/folder names only, printing any differences?

泄露秘密 提交于 2019-12-06 10:23:46

My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.

import os
import re

def build_files_set(rootdir):
    root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')

    files_set = set()
    for (dirpath, dirnames, filenames) in os.walk(rootdir):
        for filename in filenames + dirnames:
            full_path = os.path.join(dirpath, filename)
            relative_path = root_to_subtract.sub('', full_path, count=1)
            files_set.add(relative_path)

    return files_set

def compare_directories(dir1, dir2):
    files_set1 = build_files_set(dir1)
    files_set2 = build_files_set(dir2)
    return (files_set1 - files_set2, files_set2 - files_set1)

if __name__ == '__main__':
    dir1 = 'old'
    dir2 = 'new'
    in_dir1, in_dir2 = compare_directories(dir1, dir2)

    print '\nFiles only in {}:'.format(dir1)
    for relative_path in in_dir1:
        print '* {0}'.format(relative_path)

    print '\nFiles only in {}:'.format(dir2)
    for relative_path in in_dir2:
        print '* {0}'.format(relative_path)

Discussion

  • The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names

  • The function compare_directories() takes two set of files and return the diferences--very straight forward.

Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.

import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
   for name in files:
   fpa[name] = 1

fpb = {}
for root, dirs, files in os.walk('/your/path2'):
   for name in files:
   fpb[name] = 1

print "files only in a"
for name in fpa.keys():
    if not(name in fpb.keys()):
        print name,"\n"

print "files only in b"
for name in fpb.keys():
    if not(name in fpa.keys()):
        print name,"\n"

I didn't test this so you may have to fix Also it can easily be refactored to avoid reuse

Actually, filecmp can and should be used for this, but you have to do a little coding.

  • You give filecmp.dircmp() two directories, which it calls left and right.
  • filecmp.dircmp.left_only is a list of the files and dirs that are only in the left dir.
  • filecmp.dircmp.right_only is a list of the files and dirs that are only in the right dir.
  • filecmp.dircmp.common_dirs is a list of the dirs that are in both.

You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.

Code:

from os.path import join
from filecmp import dircmp

def find_uncommon(L_dir, R_dir):
    dcmp = dircmp(L_dir, R_dir)
    L_only = [join(L_dir, f) for f in dcmp.left_only]
    R_only = [join(R_dir, f) for f in dcmp.right_only]
    for sub_dir in dcmp.common_dirs:
        new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
        L_only.extend(new_L)
        R_only.extend(new_R)
    return L_only, R_only

Test Case:

C:/
    L_dir/
        file_in_both_trees.txt
        file_in_L_tree.txt
        dir_in_L_tree/
        dir_in_both_trees/
            file_in_both_trees.txt
            file_in_L_tree.txt
            dir_in_L_tree/
                file_inside_dir_only_in_L_tree.txt
    R_dir/
        file_in_both_trees.txt
        file_in_R_tree.txt
        dir_in_R_tree/
        dir_in_both_trees/
            file_in_both_trees.txt
            file_in_R_tree.txt
            dir_in_R_tree/
                file_inside_dir_only_in_R_tree.txt

Demo:

L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))

Result:

Left_only:
    C:\L_dir\file_in_L_tree.txt
    C:\L_dir\dir_in_L_tree
    C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
    C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
    C:\R_dir\file_in_R_tree.txt
    C:\L_dir\dir_in_R_tree
    C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
    C:\R_dir\dir_in_both_trees\dir_in_R_tree

Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:

file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!