How do I recursively compare two directories (comparison should be based only on file name) and print out files/folders only in one or the other directory?
I'm using Python 3.3.
I've seen the filecmp
module, however, it doesn't seem to quite do what I need. Most importantly, it compares files based on more than just the filename.
Here's what I've got so far:
import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()
dir1
looks like this:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
And dir2
looks like this:
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
I want results to look something like:
files/folders only in dir1
* anotherfile.xml
* athirdfolder
files/folders only in dir2
* anotherfolder/file2.txt
I need a simple pythonic way to compare two directoies based only on file/folder name, and print out differences.
Also, I need a way to check whether the directories are identical or not.
Note: I have searched on stackoverflow and google for something like this. I see lots of examples of how to compare files taking into account the file content, but I can't find anything about just file names.
My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.
import os
import re
def build_files_set(rootdir):
root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')
files_set = set()
for (dirpath, dirnames, filenames) in os.walk(rootdir):
for filename in filenames + dirnames:
full_path = os.path.join(dirpath, filename)
relative_path = root_to_subtract.sub('', full_path, count=1)
files_set.add(relative_path)
return files_set
def compare_directories(dir1, dir2):
files_set1 = build_files_set(dir1)
files_set2 = build_files_set(dir2)
return (files_set1 - files_set2, files_set2 - files_set1)
if __name__ == '__main__':
dir1 = 'old'
dir2 = 'new'
in_dir1, in_dir2 = compare_directories(dir1, dir2)
print '\nFiles only in {}:'.format(dir1)
for relative_path in in_dir1:
print '* {0}'.format(relative_path)
print '\nFiles only in {}:'.format(dir2)
for relative_path in in_dir2:
print '* {0}'.format(relative_path)
Discussion
The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names
The function compare_directories() takes two set of files and return the diferences--very straight forward.
Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.
import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
for name in files:
fpa[name] = 1
fpb = {}
for root, dirs, files in os.walk('/your/path2'):
for name in files:
fpb[name] = 1
print "files only in a"
for name in fpa.keys():
if not(name in fpb.keys()):
print name,"\n"
print "files only in b"
for name in fpb.keys():
if not(name in fpa.keys()):
print name,"\n"
I didn't test this so you may have to fix Also it can easily be refactored to avoid reuse
Actually, filecmp
can and should be used for this, but you have to do a little coding.
- You give
filecmp.dircmp()
two directories, which it calls left and right. filecmp.dircmp.left_only
is a list of the files and dirs that are only in the left dir.filecmp.dircmp.right_only
is a list of the files and dirs that are only in the right dir.filecmp.dircmp.common_dirs
is a list of the dirs that are in both.
You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.
Code:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
Test Case:
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
Demo:
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Result:
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:
file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
来源:https://stackoverflow.com/questions/15069091/compare-directories-on-file-folder-names-only-printing-any-differences