问题
How do I recursively compare two directories (comparison should be based only on file name) and print out files/folders only in one or the other directory?
I'm using Python 3.3.
I've seen the filecmp
module, however, it doesn't seem to quite do what I need. Most importantly, it compares files based on more than just the filename.
Here's what I've got so far:
import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()
dir1
looks like this:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
And dir2
looks like this:
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
I want results to look something like:
files/folders only in dir1
* anotherfile.xml
* athirdfolder
files/folders only in dir2
* anotherfolder/file2.txt
I need a simple pythonic way to compare two directoies based only on file/folder name, and print out differences.
Also, I need a way to check whether the directories are identical or not.
Note: I have searched on stackoverflow and google for something like this. I see lots of examples of how to compare files taking into account the file content, but I can't find anything about just file names.
回答1:
My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.
import os
import re
def build_files_set(rootdir):
root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')
files_set = set()
for (dirpath, dirnames, filenames) in os.walk(rootdir):
for filename in filenames + dirnames:
full_path = os.path.join(dirpath, filename)
relative_path = root_to_subtract.sub('', full_path, count=1)
files_set.add(relative_path)
return files_set
def compare_directories(dir1, dir2):
files_set1 = build_files_set(dir1)
files_set2 = build_files_set(dir2)
return (files_set1 - files_set2, files_set2 - files_set1)
if __name__ == '__main__':
dir1 = 'old'
dir2 = 'new'
in_dir1, in_dir2 = compare_directories(dir1, dir2)
print '\nFiles only in {}:'.format(dir1)
for relative_path in in_dir1:
print '* {0}'.format(relative_path)
print '\nFiles only in {}:'.format(dir2)
for relative_path in in_dir2:
print '* {0}'.format(relative_path)
Discussion
The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names
The function compare_directories() takes two set of files and return the diferences--very straight forward.
回答2:
Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.
import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
for name in files:
fpa[name] = 1
fpb = {}
for root, dirs, files in os.walk('/your/path2'):
for name in files:
fpb[name] = 1
print "files only in a"
for name in fpa.keys():
if not(name in fpb.keys()):
print name,"\n"
print "files only in b"
for name in fpb.keys():
if not(name in fpa.keys()):
print name,"\n"
I didn't test this so you may have to fix Also it can easily be refactored to avoid reuse
回答3:
Actually, filecmp
can and should be used for this, but you have to do a little coding.
- You give
filecmp.dircmp()
two directories, which it calls left and right. filecmp.dircmp.left_only
is a list of the files and dirs that are only in the left dir.filecmp.dircmp.right_only
is a list of the files and dirs that are only in the right dir.filecmp.dircmp.common_dirs
is a list of the dirs that are in both.
You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.
Code:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
Test Case:
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
Demo:
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Result:
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:
file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
来源:https://stackoverflow.com/questions/15069091/compare-directories-on-file-folder-names-only-printing-any-differences