Uniquely identify files/folders in NTFS, even after move/rename

江枫思渺然 提交于 2020-02-24 11:01:14

问题


I haven't found a backup (synchronization) program which does what I want so I'm thinking about writing my own.

What I have now does the following: It goes through the data in the source and for every file which has its archive bit set OR does not exist in the destination, copies it to the destination, overwriting a possibly existing file. When done, it checks for all files in the destination if it exists in the source, and if it doesn't, deletes it.

The problem is that if I move or rename a large folder, it first gets copied to the destination even though it is in principle already there, just has a different path. Then the folder which was already there is deleted afterwards.

Apart from the unnecessary copying, I frequently run into space problems because my backup drive isn't large enough to hold the original data twice.

Is there a way to programmatically identify such moved/renamed files or folders, i.e. by NTFS ID or physical location on media or something else? Are there solutions to this problem?

I do not care about the programming language, but hints for doing this with Python, C++, C#, Java or Prolog are appreciated.


回答1:


Are you familiar with object IDs? This might be what you're looking for: http://msdn.microsoft.com/en-us/library/aa363997.aspx

You may also want to use file IDs. You can get this from the FileId field of FILE_ID_BOTH_DIR_INFO you get by calling GetFileInformationByHandleEx or the nFileIndexLow and nFileIndexHigh fields of BY_HANDLE_FILE_INFORMATION you get by calling GetFileInformationByHandle.

Although it would require you to redesign your system, NTFS has a feature called a change journal that was designed for just this situation. It keeps track of every file that was changed, even across reboots. When your program runs, it would read the change journal from whenever it left off. For every file that was deleted, delete that file on your backup. For every file that was renamed, rename that file on your backup. For every file that was created or changed, copy that file to your backup. Now, instead of having to traverse both directory trees in parallel, you can simply traverse the list of files you'll actually have to pay attention to.




回答2:


Not sure about NTFS specifics which might help you but didn't you think about comparing file hashes? And in order not to calculate hash many times you can firstly compare file sizes.



来源:https://stackoverflow.com/questions/5052477/uniquely-identify-files-folders-in-ntfs-even-after-move-rename

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!