Can inode and crtime be used as a unique file identifier?

后端 未结 3 1975
后悔当初
后悔当初 2020-12-31 06:43

I have a file indexing database on Linux. Currently I use file path as an identifier. But if a file is moved/renamed, its path is changed and I cannot match

相关标签:
3条回答
  • 2020-12-31 07:14
    • {device_nr,inode_nr} are a unique identifier for an inode within a system
    • moving a file to a different directory does not change its inode_nr
    • the linux inotify interface enables you to monitor changes to inodes (either files or directories)

    Extra notes:

    • moving files across filesystems is handled differently. (it is infact copy+delete)
    • networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
    • Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)

    Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it. The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.

    0 讨论(0)
  • 2020-12-31 07:18

    The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.

    For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.

    In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.

    This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.

    From the source code for ialloc.c, where i-nodes are allocated:

    There are two policies for allocating an inode. If the new inode is a directory, then a forward search is made for a block group with both free space and a low directory-to-inode ratio; if that fails, then of he groups with above-average free space, that group with the fewest directories already is chosen. For other inodes, search forward from the parent directory's block group to find a free inode.

    The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c

    0 讨论(0)
  • 2020-12-31 07:20

    I guess the dB application would need to consider the case where the file is subject to restoration from backup, which would preserve the file crtime, but not the inode number.

    0 讨论(0)
提交回复
热议问题