I know git will not see am empty dir but can someone provide a reference to some documentation on how exactly it is implemented. It\'s not only about empty folders. If I add a f
I know Git will not see an empty dir ...
This is not quite right. Git will see it just fine, it just won't save it.
but can someone provide a reference to some documentation on how exactly it is implemented.
Good software usually tries to hide implementation details, which suggests that Git is not very good, :-) but in this case the implementation details really are pretty well hidden. Git's internals documentation is here, with one skeleton api-in-core-index.txt last updated 9 years ago (!), and a more recent index-format.txt. In any case, tracking is all about Git's index, which has several names: "the index", "the staging area", and "the cache".
It's not only about empty folders. If I add a file to a new folder, but I don't add it to the staging area, Git actually sees the folder, but not the file.
That's not quite right either. Try running git status -uall
(or, equivalently, git status --untracked-files=all
).1 What's happening here is that the git status
command normally summarizes the untracked files via a simple rule: if a directory named dir
exists, and some untracked files were found within dir
but no tracked files were found within dir
, Git just prints dir/
rather than enumerating each file within dir
.
If you use -uno
(or --untracked-files=no
), Git does not even look for untracked files, which saves time. In a large repository (tens of thousands of directories, hundreds of thousands or even millions of files), this can make the difference between git status
taking well under one second, and git status
taking many seconds.
Finding all untracked files requires comparing the actual work-tree to the cached version of the work-tree stored in the index. With the normal (summarizing) mode, Git can sometimes use its cache to avoid not only enumerating the files within dir
, but even looking inside dir
, which also saves time.
Of course, not looking for untracked file at all means Git will never remind you to git add
such files. So the default (summarizing) mode is meant as a compromise, both in terms of speed of operation ("If dir
contains any files2 in itself or via subdirectories, but we already know no files within dir
are tracked, don't bother doing finer-grained scans for files") and usability ("no need to spam the listing with 19,365 file names within dir
when we can just say dir/
").
1The default if you specify no options is -unormal
, but if you specify -u
, that means -uall
. You can, however, also set the status.showUntrackedFiles configuration variable to modify the default.
2Testing this ("does dir
or its subdirectories contain any ordinary files") partly depends on support for the d_type
field in readdir's dirent data, which is not required by POSIX but is common (it's certainly found on all modern Unix variants). Recent versions of Git also have an "untracked cache" extension to the index format, described in that same technical documentation, that allows Git to skip reading the untracked directories if their stat
data have not changed, using the mtime
field of the stat structure.