Deleting duplicate files if file exists in certain directories - python

与世无争的帅哥 提交于 2021-01-28 19:52:47

问题


I have 3 folders - 1 master and 2 supplemental. I am writing a script that identifies duplicate files in all three via SHA1 hashing. For any duplicates found in master and supplementals (or their subdirectories), I would like to delete the files in the supplemental folders and keep the files in the master folder. If duplicate files are found in the supplemental folders and not the master folder, I would like to keep them and eventually merge with the master.

I have written a script (below) that successfully gets rid of duplicate files in the supplemental folders. However, it gets rid of all duplicates, even if the file is not found somewhere in the master folder tree. Logically, I am having trouble thinking of a way to delete the duplicate files in the supplementary folders ONLY if they already exist in the master folder. Any advice, suggestions, or tips would be much appreciated!

def deleteDups(maindirectory, pnhpdirectory, dupdirectories):
    hashmap = {}
    for path, dirs, files in os.walk(maindirectory):
        for name in files:
            fullname = os.path.join(path, name)
            with open(fullname, 'rb') as f:
                d = f.read()
                h = hashlib.md5(d).hexdigest()
                filelist = hashmap.setdefault(h, [])
                filelist.append(fullname)
    # delete records in dictionary that have only 1 item (meaning no duplicate)
    for k, v in hashmap.items():
        if len(v) == 1:
            del hashmap[k]
    # make dictionary into flat list
    try:
        dups = reduce(lambda x, y: x+y, hashmap.values())
        paths = [] # list of all files in duplicate directories
        for directory in dupdirectories:
            for root, dirs, files in os.walk(directory):
                for name in files:
                    paths.append(os.path.join(root, name))

        # if file in directory is also in duplicates list, it will be deleted
        DeletedFileSize = 0.00
        for file in paths:
            if file in dups:
                FileSize = os.path.getsize(file)
                DeletedFileSize = DeletedFileSize + FileSize
                print "Deleting file: " + file
                os.remove(file)
            else:
                pass

        if DeletedFileSize == 0:
            print "No duplicate files found"
            print "Space saved: " + str(DeletedFileSize) + " gigabytes"
        else:
            DeletedFileSize = DeletedFileSize / 1073741824
            print "Space saved: " + str(DeletedFileSize) + " gigabytes"

    except TypeError:
        print "No duplicate files found."

来源:https://stackoverflow.com/questions/38860276/deleting-duplicate-files-if-file-exists-in-certain-directories-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!