Could somebody explain how git knows internally that files X, Y and Z have changed? What is the process behind the scenes that recognizes when a file has not yet been added or h
The mechanisms by which one determines the status of a file is fairly straightforward. To know what files have been staged, one simply diffs the HEAD
tree with the index. Any items that appear only in the index have been staged for addition, any items that appear only in HEAD
have been removed and any items that are different have had changes staged.
Similarly, one would detect unstaged changes by diff'ing the index with the working directory.
Your question in particular asks how this can be so fast (after all, computing the SHA1 hash of a file is not exactly speedy.) This is where the index - also known as the cache - comes in to play again. The index also has fields for the file size and file modification time. Thus one can simply stat(2)
a file on disk and compare against the index's file size and file modification time to know whether to hash the file or not.