How does git manage directories

后端 未结 2 2010
攒了一身酷
攒了一身酷 2021-01-23 10:32

I know git will not see am empty dir but can someone provide a reference to some documentation on how exactly it is implemented. It\'s not only about empty folders. If I add a f

相关标签:
2条回答
  • 2021-01-23 11:00

    I know Git will not see an empty dir ...

    This is not quite right. Git will see it just fine, it just won't save it.

    but can someone provide a reference to some documentation on how exactly it is implemented.

    Good software usually tries to hide implementation details, which suggests that Git is not very good, :-) but in this case the implementation details really are pretty well hidden. Git's internals documentation is here, with one skeleton api-in-core-index.txt last updated 9 years ago (!), and a more recent index-format.txt. In any case, tracking is all about Git's index, which has several names: "the index", "the staging area", and "the cache".

    It's not only about empty folders. If I add a file to a new folder, but I don't add it to the staging area, Git actually sees the folder, but not the file.

    That's not quite right either. Try running git status -uall (or, equivalently, git status --untracked-files=all).1 What's happening here is that the git status command normally summarizes the untracked files via a simple rule: if a directory named dir exists, and some untracked files were found within dir but no tracked files were found within dir, Git just prints dir/ rather than enumerating each file within dir.

    If you use -uno (or --untracked-files=no), Git does not even look for untracked files, which saves time. In a large repository (tens of thousands of directories, hundreds of thousands or even millions of files), this can make the difference between git status taking well under one second, and git status taking many seconds.

    Finding all untracked files requires comparing the actual work-tree to the cached version of the work-tree stored in the index. With the normal (summarizing) mode, Git can sometimes use its cache to avoid not only enumerating the files within dir, but even looking inside dir, which also saves time.

    Of course, not looking for untracked file at all means Git will never remind you to git add such files. So the default (summarizing) mode is meant as a compromise, both in terms of speed of operation ("If dir contains any files2 in itself or via subdirectories, but we already know no files within dir are tracked, don't bother doing finer-grained scans for files") and usability ("no need to spam the listing with 19,365 file names within dir when we can just say dir/").


    1The default if you specify no options is -unormal, but if you specify -u, that means -uall. You can, however, also set the status.showUntrackedFiles configuration variable to modify the default.

    2Testing this ("does dir or its subdirectories contain any ordinary files") partly depends on support for the d_type field in readdir's dirent data, which is not required by POSIX but is common (it's certainly found on all modern Unix variants). Recent versions of Git also have an "untracked cache" extension to the index format, described in that same technical documentation, that allows Git to skip reading the untracked directories if their stat data have not changed, using the mtime field of the stat structure.

    0 讨论(0)
  • 2021-01-23 11:02

    There are two levels to what’s going on: there’s what happens under the hood (plumbing) and what you actually see (porcelain).

    To learn all about the plumbing layer, I recommend checking out this section of Pro Git. In short, a directory is stored as a tree object with contents like

    100644 blob a906cb2a4a904a152e80877d4088654daad0c859      README
    100644 blob 8f94139338f9404f26296befa88755fc2598c289      Rakefile
    040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0      lib
    

    The first column is for permissions, the second column is for whether it’s a blob (a file) or a tree (another directory), the third column is for the SHA-1 of the object, and the last column is the filename.

    Although there’s nothing on the plumbing side preventing you from putting an empty tree object in a commit, it can cause problems later. If you want to achieve a similar effect, you can put a file in the directory. If you want to force the directory to remain empty, you can use this solution; if you don’t care if people put files in later, it can be a README or an empty .gitignore.

    0 讨论(0)
提交回复
热议问题