Compare directories on file/folder names only, printing any differences?

[亡魂溺海] 提交于 2019-12-08 02:43:01

问题


How do I recursively compare two directories (comparison should be based only on file name) and print out files/folders only in one or the other directory?

I'm using Python 3.3.

I've seen the filecmp module, however, it doesn't seem to quite do what I need. Most importantly, it compares files based on more than just the filename.

Here's what I've got so far:

import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()

dir1 looks like this:

dir1
  - atextfile.txt
  - anotherfile.xml
  - afolder
    - testscript.py
  - anotherfolder
    - file.txt
  - athirdfolder

And dir2 looks like this:

dir2
  - atextfile.txt
  - afolder
    - testscript.py
  - anotherfolder
    - file.txt
    - file2.txt

I want results to look something like:

files/folders only in dir1
  * anotherfile.xml
  * athirdfolder

files/folders only in dir2
  * anotherfolder/file2.txt

I need a simple pythonic way to compare two directoies based only on file/folder name, and print out differences.

Also, I need a way to check whether the directories are identical or not.

Note: I have searched on stackoverflow and google for something like this. I see lots of examples of how to compare files taking into account the file content, but I can't find anything about just file names.


回答1:


My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.

import os
import re

def build_files_set(rootdir):
    root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')

    files_set = set()
    for (dirpath, dirnames, filenames) in os.walk(rootdir):
        for filename in filenames + dirnames:
            full_path = os.path.join(dirpath, filename)
            relative_path = root_to_subtract.sub('', full_path, count=1)
            files_set.add(relative_path)

    return files_set

def compare_directories(dir1, dir2):
    files_set1 = build_files_set(dir1)
    files_set2 = build_files_set(dir2)
    return (files_set1 - files_set2, files_set2 - files_set1)

if __name__ == '__main__':
    dir1 = 'old'
    dir2 = 'new'
    in_dir1, in_dir2 = compare_directories(dir1, dir2)

    print '\nFiles only in {}:'.format(dir1)
    for relative_path in in_dir1:
        print '* {0}'.format(relative_path)

    print '\nFiles only in {}:'.format(dir2)
    for relative_path in in_dir2:
        print '* {0}'.format(relative_path)

Discussion

  • The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names

  • The function compare_directories() takes two set of files and return the diferences--very straight forward.




回答2:


Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.

import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
   for name in files:
   fpa[name] = 1

fpb = {}
for root, dirs, files in os.walk('/your/path2'):
   for name in files:
   fpb[name] = 1

print "files only in a"
for name in fpa.keys():
    if not(name in fpb.keys()):
        print name,"\n"

print "files only in b"
for name in fpb.keys():
    if not(name in fpa.keys()):
        print name,"\n"

I didn't test this so you may have to fix Also it can easily be refactored to avoid reuse




回答3:


Actually, filecmp can and should be used for this, but you have to do a little coding.

  • You give filecmp.dircmp() two directories, which it calls left and right.
  • filecmp.dircmp.left_only is a list of the files and dirs that are only in the left dir.
  • filecmp.dircmp.right_only is a list of the files and dirs that are only in the right dir.
  • filecmp.dircmp.common_dirs is a list of the dirs that are in both.

You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.

Code:

from os.path import join
from filecmp import dircmp

def find_uncommon(L_dir, R_dir):
    dcmp = dircmp(L_dir, R_dir)
    L_only = [join(L_dir, f) for f in dcmp.left_only]
    R_only = [join(R_dir, f) for f in dcmp.right_only]
    for sub_dir in dcmp.common_dirs:
        new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
        L_only.extend(new_L)
        R_only.extend(new_R)
    return L_only, R_only

Test Case:

C:/
    L_dir/
        file_in_both_trees.txt
        file_in_L_tree.txt
        dir_in_L_tree/
        dir_in_both_trees/
            file_in_both_trees.txt
            file_in_L_tree.txt
            dir_in_L_tree/
                file_inside_dir_only_in_L_tree.txt
    R_dir/
        file_in_both_trees.txt
        file_in_R_tree.txt
        dir_in_R_tree/
        dir_in_both_trees/
            file_in_both_trees.txt
            file_in_R_tree.txt
            dir_in_R_tree/
                file_inside_dir_only_in_R_tree.txt

Demo:

L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))

Result:

Left_only:
    C:\L_dir\file_in_L_tree.txt
    C:\L_dir\dir_in_L_tree
    C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
    C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
    C:\R_dir\file_in_R_tree.txt
    C:\L_dir\dir_in_R_tree
    C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
    C:\R_dir\dir_in_both_trees\dir_in_R_tree

Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:

file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt


来源:https://stackoverflow.com/questions/15069091/compare-directories-on-file-folder-names-only-printing-any-differences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!